It was a cold foggy night in Gurgaon. A bunch of us from the Wynk team were talking about the shift that entertainment consumers in India are going through. Most were moving from the traditional model of downloading/buying and play to a new-age model of streaming.
For users to make the jump from downloading/buying to streaming and for them to encourage their friends to follow suit, the streaming model had to be equally trusted. This posed one significant challenge. While streaming provides a user with a wider range of music options, the user is also at the mercy of the internet bandwidth that is available to them.
The two main thoughts that came out from our discussion were :
- Streaming content needs to run as fast as downloaded content, so that users get instant gratification.
- A user’s experience should not fluctuate due to changes in internet bandwidth availability.
While most of us forgot about this conversation the next day, Ankit and I decided to take up this challenge. We met again the next day and wrote down the problem on a whiteboard. We figured the most important metric we wanted to impact was:
Click To Play time or CTP (The time it takes for music to play when someone presses play in the Wynk App)
We quickly caught up with our CTO Sudipta. He suggested that instead of trying to tackle this problem for all media streams, we should first solve for this for repeated plays (songs not being played for the first time). It made perfect sense since this category of songs made up 35% of songs played on Wynk.
As good engineers, we asked ourselves what the current metrics looked like. The CTP for majority (75%) of our users was an average of 10 seconds. While this does not seem like much, every millisecond shaved off can make or break the user experience. We set ourselves the audacious goal of reducing this number by 3x.
A bit about music streaming at Wynk
In the old days, streaming an mp3/mp4 file meant downloading it completely and playing it back on the user’s device. Luckily, we have moved beyond the stone age. Streaming at Wynk uses the HLS protocol pioneered at Apple and baked into a forked version of the open source Exoplayer by Google. The way it works is that each song is broken into:
- 1 master file (which contains URLs to index files of different bit-rates)
- Index files (which contain URLs to 10 second segments)
- 10 second segment files.
Sample Master File
Sample index file
Coming back to the audacious goal
To reduce the CTP by 3x, we analyzed the entire cycle of playback and listed various points where we could potentially impact the CTP.
The first thing we tackled were API calls that were made when starting playback of a song. Typically every playback required a minimum of 4 API calls.The first to fetch the authenticated master URL. The second to fetch the actual master file. The third to fetch the right index file (as determined by bandwidth) and the fourth to fetch the first segment file. However, we realised that for previously played files, we could exclude the first 2 calls if we already had the master and the index file in the user’s cache. This would mean instantaneous play back 🙂 By using this approach, we were able to modify our code to use local data (if the data was present in the user’s cache) instead of the API calls.
For the above change to show its true potential, we also had to improve our caching. This brings us to second improvement we took up — the cache.
The LRU (Least Recently Used) cache is a cache replacement algorithm that clears the cache of the least recently used (touched) files. A simplified version of our vanilla LRU cache works like this.
When the cache is full and we had to store a new song in it, we just deleted the earliest touched files (including the masterfile/the index file and all segments). This meant that at any point of time, only a small number of songs could be instantly started.
We had to modify the LRU cache to be able to start as many songs as possible instantaneously. So we optimized the LRU cache to prioritize master/index and first segment files over others.
Which means now when the cache was overflowing, instead of deleting the entire song we only deleted the fourth segment onward. This allows for more songs to start playing instantaneously and continue playing over streaming.
Comparing the vanilla LRU cache and the V2 LRU cache, we can see that a lot more songs can now be played instantaneously.
Even after these modifications, we realised that some users were facing long CTP times. Upon closer inspection, we figured this almost always happened to users who faced a sudden change in bandwidth while streaming music. We further investigated and found that due to the change in bandwidth, the Wynk player was requesting a higher quality stream from the server (lets say 320kbps) while making the user wait even though there was a lower quality stream in the cache (let’s say 128 kbps). We quickly made the change to allow the player to fallback on the lower quality stream and continue playback while fetching the higher quality stream in tandem and switched to the higher quality as soon as it was available.
The combination of the above changes brought the CTP time down from 10 seconds to 2.8 seconds — and we beat our audacious goal of reducing the time by 3x! Our users have now enjoyed more than 5 billion songs with low wait times — thanks to that impromptu discussion that night and our engineering leadership’s trust that anyone in the team can come up and execute user centric ideas even if it affects the very core of our product.