Introduction
In the HLS in depth blog, we explored the core HLS protocol, how it works, its benefits, and its limitations. While discussing limitations, one notable point was higher latency. Reducing it is a tricky problem, and so efforts began in this direction. Today, we will explore how they led to the final evolution of LL-HLS and CL-HLS.
Terminology
Before we start, let's get our terminology straight. In this article, CL-HLS refers to Community HLS, also known as L-HLS. LL-HLS refers to Apple's low-latency HLS or AL-HLS. I have seen a lot of confusion around these (I had much more). We will use CL-HLS and LL-HLS for both of these independent low-latency solutions.
HLS
With traditional HLS (HTTP Live Streaming), we wait for a segment to be produced. Then, we copy it to some web-server/edge location, which then gets polled by the client to fetch them. For a 6 second segment, this would mean 6 second input wait + encoder costs + CDN fetch costs + client buffer, which is 3 segments, so 4 segments to start playing, i.e., 24 seconds. A notable point would be how this latency begins to add up in traditional HLS.
Before even starting to optimize, let's get a mental model around this:
What we don't focus to optimize
One would say we can start optimizing from source to encoder delivery. The catch is that the HLS Spec does not discuss this part of delivery so that it can be done in any way open to the implementor. The most famous route is RTMP, but we have seen WebRTC emerging as an excellent alternative. With OBS getting first-class support for it using WHIP, it is exciting to see how the ingest side of things evolves.
But for optimizations in HLS, we don't focus on ingest from a spec perspective. So, the next part is the encoder. Again, one can get the best encoder, and it's a complete field of its own to optimize them, which is also the reason the spec doesn't talk about them explicitly. So that is out of the picture.
But now everything after segments, leaving the encoder, is HLS territory (including the client) and is open for optimizations. So our problem statement is getting output from the encoder to glass (viewer's screen) as fast as possible.
Segment size optimization
If we revisit the segment flow mentioned above, we could just segment size smaller, less than a second maybe, and then keep sending them. But there's a catch — we always want each segment to be independently decodable. What do I mean by independently decodable? I will give a small, just enough primer on how the encoder works; everything above that is homework for you!
When talking about still pictures, we know one way they can be represented in memory — RGB. So, we store the RGB value for each pixel, and a nested array forms a picture. That's a lot of data points! Well, now let's say we are dealing with a stream of pictures and storing this representation for each of them. This would require a lot of memory. So, what we do is choose a base representation. We can call it I-Frame (keyframes). These are not directly RGB representations but with some smart compression. Now, whatever changes in the next frame, we take a smart difference from the I-Frame and store it as a P-Frame. P-Frames, hence, are dependent frames, and I-Frames are independent frames as they don't depend on anyone else. In Wikipedia's words:
I‑frames are the least compressible but don't require other video frames to decode. P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.
Back to our topic, each segment needs at least one I-Frame, a simple reason being that a viewer can join the stream at any time, and a peer starts getting the segment from that point in time; hence, that segment should be independent of the start of the stream. We discussed that I-Frames are heavy; thus, reducing segment size would cause keyframe generation to increase and the size of each segment to grow. Segment size directly corresponds to bandwidth usage, which would start choking on lower bandwidth devices.
We now understand why we can't just reduce the segment size mindlessly. But we do need to deliver smaller chunks of data to the client. What if we break this segment into smaller "parts," where each part is not required to have a keyframe, but if it does, we can mention it as "INDEPENDENT".
This way, the client only looks for parts that are "INDEPENDENT" and starts decoding from that point. It does not need to wait for the complete segment to begin playing. This also became possible due to Chunked CMAF and Partial TS. We mentioned these in our previous article. They are container-type formats, a way to store video/audio data. Getting support chunks/smaller parts allowed us to shrink normal container sizes into n number of chunks. The ideal size of these parts is 300-400ms.
This is the same method CL-HLS and LL-HLS use to reduce latency. The tag used to identify a partial segment is EXT-X-PART. If we check the spec, we find another interesting attribute, BYTERANGE, which is a way to indicate that the delivered part will be in this byte range of the complete segment.
Delivery
Okay, this seems reasonable, but how does the client get these parts? The client will have to fetch manifest to understand how to find them. However, as we keep generating more parts, we keep changing the manifest file really fast, so we would also have to fetch new changes quickly. To optimize this step, CL-HLS and LL-HLS took two different routes.
CL-HLS optimized it by pre-announcing segments, and when they were fetched, it used HTTP Chunked Transfer Encoding (CTE) to deliver parts continuously.
CTE is a method of sending data in chunks, so instead of letting the client know the total size of the payload, we keep sending chunks with the mentioned size, and when we are done, we send an empty chunk. This is usually employed in cases where the actual payload size is unknown.
One can understand why this is a neat technique. We are announcing yet-to-be-formed segments and leveraging the network round trip time to make parts and continuously deliver them using CTE. One point about using CTE is it makes bandwidth estimation harder.
LL-HLS took a different route. In their initial 2019 announcement, it seemed their strategy was to write segments to the manifest file if and only if generated. To compensate for low latency, they started pushing segments using HTTP/2 Push when the client requested a newer manifest. This saves a round trip time of the client reading the manifest, understanding it, and then requesting segments/parts.
Later, after taking feedback from the community, they modified the spec in 2020 to include a prefetch tag, which, like the CL-HLS version, allowed to pre-announce segments. So, HTTP/2 Push isn't a mandatory requirement anymore.
But the client still has to fetch these after getting the manifest; no CTE is involved. This does make a few things more bandwidth-consuming. A protocol with all Apple's changes and CTE for continuous delivery would be a perfect middle-ground, and maybe the community's ultimate goal was planning for low-latency HLS.
The tag used to pre-announce a segment is EXT-X-PRELOAD-HINT.
Playlist delta updates
In HLS, another problem we discussed was around big playlists. Let's say we have a long livestream going on, and fetching manifest listing segments from the very start until the end would make round-trip super heavy. Instead, we could get only the updates and deltas from the playlist at one point. Playlist delta updates enable this. It is an exciting feature, and hence, also ported back to HLS specification.
The EXT-X-SKIP tag is used as a marker to skip a manifest section. To request a delta update from the server, the client uses_HLS_skip=YES|v2 query param.
Blocking playlist updates
For CDNs, let's say we set cache TTL to 2 seconds. This would mean CDN will not serve newer segments produced every 300-400 ms till 2 seconds. We may need some kind of cache-busting and precise segment/part fetch mechanism. LL-HLS also provides this feature in the form of blocking playlist updates. We can tell the server not to break the connection and send a response until a particular segment/part is ready to be served.
This also helps client blocking until their set buffer size is not fulfilled; at this point, they can start decoding and displaying output instantly.
Blocking is obtained by usage of two query params in sync:
- _HLS_msn=<M>: Do not resolve the request until the M-numbered media sequence number is ready.
- _HLS_part=<N>: Do not resolve the request until the N-part of the M segment is ready. This tag needs the msn tag to be present.
Rendition reports
Another great feature added by LL-HLS is a faster way to do ABR using rendition reports. These reports contain the last media sequence number and the latest part. A rendition report is needed for each of the defined bitrates individually. EXT-X-RENDITION-REPORT is used to identify these reports.
LL-HLS example playlists
Apple documentation around LL-HLS includes examples of different playlist requests and their responses.
General low-latency playlist example for request style:
GET <https://example.com/2M/waitForMSN.php?_HLS_msn=273&_HLS_part=2>
Is this:
#EXTM3U
#EXT-X-TARGETDURATION:4
#EXT-X-VERSION:6
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=12.0
#EXT-X-PART-INF:PART-TARGET=0.33334
#EXT-X-MEDIA-SEQUENCE:266
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:13:36.106Z
#EXT-X-MAP:URI="init.mp4"
#EXTINF:4.00008,
fileSequence266.mp4
#EXTINF:4.00008,
fileSequence267.mp4
#EXTINF:4.00008,
fileSequence268.mp4
#EXTINF:4.00008,
fileSequence269.mp4
#EXTINF:4.00008,
fileSequence270.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.3.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.4.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.5.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.6.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.7.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.8.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.9.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.10.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.11.mp4"
#EXTINF:4.00008,
fileSequence271.mp4
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:14:00.106Z
#EXT-X-PART:DURATION=0.33334,URI="filePart272.a.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.b.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.c.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.d.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.e.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.f.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart272.g.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.h.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.i.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.j.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.k.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.l.mp4"
#EXTINF:4.00008,
fileSequence272.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart273.0.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart273.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart273.2.mp4"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePart273.3.mp4"
#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=2
#EXT-X-RENDITION-REPORT:URI="../4M/waitForMSN.php",LAST-MSN=273,LAST-PART=1
We can see some familiar tags from the HLS post and some new ones we learned in this article. Part duration is defined using tag #EXT-X-PART-INF:PART-TARGET=0.33334. For server control params, we see #EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=12.0. This has a couple of instructions. Let's understand them one by one:
- CAN-BLOCK-RELOAD: To block reload until 273 media sequence number and its 2nd part is not ready.
- Then PART-HOLD-BACK, do not play until 'n', 1.0 here; parts are unavailable.
- And finally, CAN-SKIP-UNTIL, the server can skip a part of the playlist if the client requests it. The value for the last tag is in seconds and must be at least six times the target duration; hence, we have it as 12 here.
Actual parts are listed using #EXT-X-PART. It has DURATION and URI fields, which are pretty self-explanatory. There's also an INDEPENDENT tag at the end of some parts. These represent the I-Frames we discussed earlier and hence can be used as a point from which the client can start decoding.
Near the bottom, we can see EXT-X-PRELOAD-HINT. It mentions it's hinting for a part using TYPE=PART and then the exact URI to fetch it.
And lastly, we have rendition reports mentioning the last MSN and part index along with URI.
There's also an example of playlist delta update, which is requested using a URL like:
GET <https://example.com/2M/waitForMSN.php?_HLS_msn=273&_HLS_part=3> &_HLS_skip=YES
#EXTM3U
# Following the example above, this Playlist is a response to: GET <https://example.com/2M/waitForMSN.php?_HLS_msn=273&_HLS_part=3> &_HLS_skip=YES
#EXT-X-TARGETDURATION:4
#EXT-X-VERSION:9
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=12.0
#EXT-X-PART-INF:PART-TARGET=0.33334
#EXT-X-MEDIA-SEQUENCE:266
#EXT-X-SKIP:SKIPPED-SEGMENTS=3
#EXTINF:4.00008,
fileSequence269.mp4
#EXTINF:4.00008,
fileSequence270.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.3.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.4.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.5.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.6.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.7.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.8.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.9.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.10.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.11.mp4"
#EXTINF:4.00008,
fileSequence271.mp4
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:14:00.106Z
#EXT-X-PART:DURATION=0.33334,URI="filePart272.a.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.b.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.c.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.d.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.e.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.f.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart272.g.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.h.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.i.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.j.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.k.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.l.mp4"
#EXTINF:4.00008,
fileSequence272.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart273.0.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart273.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart273.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart273.3.mp4"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePart273.4.mp4"
#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=3
#EXT-X-RENDITION-REPORT:URI="../4M/waitForMSN.php",LAST-MSN=273,LAST-PART=3
We have a new query param of _HLS_SKIP to indicate that part of the playlist can be skipped. Then we have #EXT-X-SKIP:SKIPPED-SEGMENTS=3 to mention how many segments we have skipped in this playlist update.
They also have an example of a playlist that contains byterange-addressed parts:
# In these examples only the end of the Playlist is shown.
# This is Playlist update 1
#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs271.mp4",BYTERANGE-START=61000
# This is Playlist update 2
#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="19000@61000"
#EXTINF:4.08,
fs271.mp4
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs272.mp4",BYTERANGE-START=0
# This is Playlist update 3
#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="19000@61000"
#EXTINF:4.08,
fs271.mp4
#EXT-X-PART:DURATION=1.02,URI="fs272.mp4",BYTERANGE="21000@0"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs272.mp4",BYTERANGE-START=21000
Notice using the BYTERANGE attribute at the end of each EXT-X-PART tag. Another attractive attribute is BYTERANGE-START in EXT-X-PRELOAD-HINT tag, that is, to mark the start of the byte range of part in the segment.
Final thoughts
The need for low-latency solutions arose soon after the release of HLS, and we went from no solutions to a couple of competing solutions in no time. Watching different approaches and thought processes behind the same problem optimization is undoubtedly fascinating. Today, we have LL-HLS as the leading standard for low-latency live streaming. Amazon/Twitch evolved CL-HLS independently and used a proprietary implementation that seems to be performing really well!
Scaling low-latency solutions is definitely challenging, and that's why we at Dyte handle that for you. Feel at home with that sweet DX and leave all the complexity to us; try our Livestreaming SDK today!
Get better insights on leveraging Dyte’s technology and discover how it can revolutionize your app’s communication capabilities with its SDKs.