WebRTC 102: #4 Understanding SDP Internals

Introduction

As a WebRTC developer, you've probably heard the term ‘SDP’ thrown around quite a bit, but what exactly is SDP and why is it important in WebRTC? In this article, we'll explore SDP — its meaning and how it works in WebRTC, and offer tips and best practices for working with it.

This is the fourth part of our ongoing WebRTC 102 blog series — in the third one we covered RTP and RTCP, in the second libWebRTC, and in the first one, we tackled ICE and understood how it works under the hood.

Let’s dive in!

What is SDP and why is it important in WebRTC?

The communications protocol known as SDP, or Session Description Protocol, is used to negotiate the specifics of a real-time communication session between two devices or endpoints. SDP is used in WebRTC to negotiate the session's media parameters and to describe each device's media capabilities. To put it another way, SDP is the language that WebRTC devices speak to one another.

It facilitates real-time communication between devices with different capabilities or being positioned behind firewalls or NATs, making it an essential part of WebRTC. Real-time communication would not be possible if WebRTC devices could not negotiate the specifics of a communication session.

How Does SDP Work in WebRTC?

SDP messages are structured as a series of key-value pairs, with each pair representing a specific aspect of the session. The SDP message is typically sent as part of the WebRTC signaling process, which is used to establish a connection between two devices. The SDP negotiation process typically involves two steps: an offer and an answer.

During the offer phase, one WebRTC client sends an SDP message to the other WebRTC client, describing its media capabilities and further session details. The other WebRTC client then responds with its own SDP message as an answer, describing its capabilities and session details. The two WebRTC clients then compare the SDP messages and agree on a set of acceptable media parameters for both clients.

Once the SDP negotiation process is complete, the two devices can begin to stream media between them using the agreed-upon parameters.

This process can be complex, especially when dealing with multiple devices or networks. However, it is essential for establishing a successful WebRTC communication session.

Common SDP Attributes

SDP messages contain a variety of attributes that describe the media capabilities and other session details of a WebRTC device. Some of the most common SDP attributes include:

Version: The version of the SDP protocol being used
Origin: The originator of the SDP message, including the username, session ID, and network address
Session Name: A human-readable name for the session
Media Descriptions: Descriptions of the media streams being offered or answered, including the media type, codecs, and transport protocols
Connection Data: Information about the network addresses and ports being used for communication
Timing: Information about the timing of the session, including start and end times
Encryption: Information about any encryption mechanisms being used to secure the session

Session Description

The session description provides an overall description of the multimedia session. It includes information such as the session name, the session timing, and the connection information, for eg:

v=0
o=- 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0

where the keys mean the following:

v= (protocol version)
o= (originator and session identifier)
s= (session name)
c=* (connection information -- not required if included in all media descriptions)
t= (time the session is active)

Media Description

The media description provides specific information about the media that will be exchanged during the session. It describes the media type, the codecs used, and the transport protocol used, for eg:

m=audio 4000 RTP/AVP 111
a=rtpmap:111 OPUS/48000/2
m=video 4000 RTP/AVP 96
a=rtpmap:96 VP8/90000

m= (media name and transport address)
a= * (zero or more media attribute lines)

Attributes

Attributes provide additional information about the multimedia session. They can include information about the media bandwidth, the network addresses and ports used, and the media encryption.

Here is a summary of some typical characteristics that you will see in a WebRTC Agent's Session Description. Many of these parameters regulate the unrecognized subsystems.

`group:BUNDLE`

This line is followed by multiple mids of media available in SDP and is used for sending various media over a single UDP/TCP connection. It is generally suggested to use bundling in WebRTC.

`fingerprint:sha-256`

This line contains information about the hash of the certificated exchanged during the DTLS handshake.

`a=setup`

This controls the DTLS agent after ICE is connected, this value determines if DTLS should run as client or server. There are three possible values:

setup:active- DTLS agent will run as client
setup:passive- DTLS agent will run as server
setup:actpass- DTLS agent will let other WebRTC peer to decide what to use.

`ice-ufrag`, `ice-pwd` and `ice-options`

These are ICE-related configurations. ice-ufrag defines the username fragment and ice-pwd holds the password for ICE authentication. Whereas ice-options tell if the ICE gathering should be trickled or renomination.

`extmap`

This defines the available header extension to send or receive in offer or answer respectively for peer connection.

`msid`

This is only for telling the other party what stream ID and track one is sending. The format is ${streamid} ${trackid}.

`rtpmap`

A particular codec is mapped to an RTP Payload Type using this value. Because payload types are not fixed, the offerer chooses the payload types for each codec for each call.

`rtcp-fb`

This is present in SDP in the media section. It should not be included in the session section of SDP. rtcp-fb declares which RTCP Feedback messages should be used for a given payload type of media section.

`ssrc`

This stands for Synchronization Source. It’s a 32bit random value that denotes to send media for a specific source in RTP connection. The format is a=ssrc:<ssrc-id> cname: <cname-id>.

These are the important attributes that tell us a lot about the media being negotiated and used for a session. I hope you have understood how to read SDP and its components.

Now, we will discuss practical usages of SDP which improve the WebRTC experience such as Simulcast, Perfect Negotiation, and Renegotiation.

Simulcast

Simulcast is an advanced concept in WebRTC that drastically improves the whole media experience. It enables sending the same video stream at multiple resolutions and bitrates and selecting the most suitable stream by the receiver based on their available bandwidth and device capabilities through SDP.

To use simulcast the specs, introduce a few additional attributes on SDP. These are a=simulcast, a=rid, and an additional header extension map attribute a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id.

An example of an SDP offer using simulcast looks like

m=video 49300 RTP/AVP 97 98 99
a=rtpmap:97 H264/90000
a=rtpmap:98 H264/90000
a=rtpmap:99 VP8/90000
a=fmtp:97 profile-level-id=42c01f;max-fs=3600;max-mbps=108000
a=fmtp:98 profile-level-id=42c00b;max-fs=240;max-mbps=3600
a=fmtp:99 max-fs=240; max-fr=30
a=rid:1 send pt=97;max-width=1280;max-height=720
a=rid:2 send pt=98;max-width=320;max-height=180
a=rid:3 send pt=99;max-width=320;max-height=180
a=rid:4 recv pt=97
a=simulcast:send 1;2,3 recv 4
a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id

`a=simulcast`

This attribute describes, independently for "send" and "receive" directions, the number of simulcast RTP streams as well as potential alternative formats for each simulcast RTP stream. Each simulcast RTP stream, including alternatives, is identified using the RID identifier (rid-id), defined in [RFC8851].

a=simulcast:send 1;2,3 recv 4

The "send" element of this line, if it is present in an SDP offer, denotes the offerer's capacity and proposal to send two simulcast RTP streams. Each simulcast stream has one or more RTP stream IDs (rid-ids), with a semicolon between each group of rid-ids for the stream (";"). Several rid-ids separated by commas (",") in a simulcast stream indicate different representations for that same simulcast RTP stream. As a result, the "send" portion of the above code is taken to mean that two simulcast RTP streams are intended to be sent. Rid-id 1 is used to identify and limit the first simulcast RTP stream. Two possibilities for the second simulcast RTP stream can be delivered, identified, and limited by rid-ids 2 and 3. The offerer wishes to receive a single RTP stream (no simulcast) in accordance with rid-id 4 as indicated by the "recv" portion of the line displayed above.

This SDP offer's recipient can produce an SDP answer indicating what it accepts. It indicates simulcast capabilities and specifies which simulcast RTP streams and alternatives to receive and/or send using the "a=simulcast" element. According to the above offer, an illustration of such a responding "a=simulcast" attribute is:

a=simulcast:recv 1;2 send 4

With this SDP response, the answerer expresses their desire to receive the two simulcast RTP streams in the "recv" section, having eliminated a substitute that it does not support (rid-id 3). According to rid-id 4, the "send" component assures the offerer that they will receive one stream for this media source.

Legacy Simulcast

Legacy simulcast is nothing but the old way to do simulcast which Firefox does. It uses explicit defined ssrc and ssrc-groupattributes in SDP along with rid attributes.

An example of SDP offer generated in Firefox browser with simulcast enabled:

a=simulcast:send r1;r0
a=ssrc:4264196019 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc:2642934809 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc:764299737 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc:3939469720 cname:{816fd64c-ca90-417c-a2b7-72c7c36a6500}
a=ssrc-group:FID 4264196019 2642934809
a=ssrc-group:FID 764299737 3939469720

You already understand what ssrcimplies on SDP. So let me tell you what ssrc-groupmeans.

The attribute ssrc-group defines a relationship among several ssrcs of an RTP session. ssrc-groupis always followed by a list of ssrc-id and it can be at least one or more. A similar ssrc line should exist on the SDP message for ssrc-iddefined in ssrc-group. The semantic values defined for ssrc-group attributes are FIDwhich stands for Flow Identification and FECwhich stands for Forward Error Correction.

What basically this mean for you is if you get multiple ssrc’s and ssrc-groupin your offer then the answerer peer connection must understand that the sender will send RTP packets on defined ssrcs only.

Perfect Negotiation

From MDN docs:

Perfect negotiation makes it possible to seamlessly and completely separate the negotiation process from the rest of your application's logic. Negotiation is an inherently asymmetric operation: one side needs to serve as the "caller" while the other peer is the "callee." The perfect negotiation pattern smooths this difference away by separating that difference out into independent negotiation logic so that your application doesn't need to care which end of the connection it is. As far as your application is concerned, it makes no difference whether you're calling out or receiving a call.

To summarise this perfect negotiation is a set of processes where you avoid the collision of the SDP offer being sent from both sides at the same time.

Each of the two peers in a perfect negotiation is given a role to play in the negotiation process that is fully independent of the state of the WebRTC connection:

A considerate peer is one who avoids collisions with inbound offers by using ICE rollback. In essence, a polite peer is one who makes offers, but when another peer makes one, the courteous peer says, "Well, never mind, drop my offer and I'll consider yours instead."
A rude peer is one who never accepts proposals that compete with those it already has. It never offers anything up or makes an apology to the polite peer. When two unfriendly peers collide, the rude peer always prevails.

This way, if transmitted offers collide, both peers know what exactly should happen. Error-related reactions become much more predictable.

We won’t go much deep into the implementation of Perfect Negotiation in this post but will discuss the important components that help us to achieve perfect negotiation.

First, we need to add a handler on pc.onnegotiationneeded. The handler needs to do pc.setLocalDescription() without first generating the offer because pc.setLocalDescription takes the current state and generates the offer if required which solves one problem of unnecessarily generating multiple SDP offer for a peer connection. Then we can send the pc.localDescription to the remote peer. The whole handler looks something like this:

let makingOffer = false;

pc.onnegotiationneeded = async () => {
  try {
    makingOffer = true;
    await pc.setLocalDescription();
    signaler.send({ description: pc.localDescription });
  } catch (err) {
    console.error(err);
  } finally {
    makingOffer = false;
  }
};

Second, we need to add a handler on pc.onicecandidates. This event gets emitted once you do pc.setLocalDescription(). The parameters of this event are the list of ICE candidates that the ICE gathered for this pc. Once you get this list of candidates, you need to send them to the remote peer.

Third, we have to handle incoming remote offer SDP or ICE candidates from the remote peers. We need to check if the incoming offer is colliding due to the local peer in the process of generating the offer or if the local peer’s state is not stable. If the offer is colliding and its impolite peer just returns from the handler because the impolite peer doesn’t respect the incoming offer in the colliding state. Otherwise, do pc.setRemoteDescription(offer)and if the incoming message is offer then you just need to do pc.setLocalDescription() without parameter so that it will automatically generate an answer for you and set it in the local description. Then you just send pc.localDescription to a remote peer and voila your perfect negotiation is done. In code, you can write this as:

let ignoreOffer = false;

signaler.onmessage = async ({ data: { description, candidate } }) => {
  try {
    if (description) {
      const offerCollision =
        description.type === "offer" &&
        (makingOffer || pc.signalingState !== "stable");

      ignoreOffer = !polite && offerCollision;
      if (ignoreOffer) {
        return;
      }

      await pc.setRemoteDescription(description);
      if (description.type === "offer") {
        await pc.setLocalDescription();
        signaler.send({ description: pc.localDescription });
      }
    } else if (candidate) {
      try {
        await pc.addIceCandidate(candidate);
      } catch (err) {
        if (!ignoreOffer) {
          throw err;
        }
      }
    }
  } catch (err) {
    console.error(err);
  }
};

We understand the whole process sounds super complex, but once you implement this in your application, you won’t have to worry about SDP collision and can focus on other parts.

Debugging SDP issues

When you get your hands dirty with SDP, you must have some tools ready to help debug issues more efficiently. There are not many tools available around SDP, but a few SDP parsers are available that you can use to make SDP string readable. Some are:

There are also a bunch of SDP parser libraries for a few languages like JavaScript, and Go.

Best Practices for Working with SDP

Working with SDP in WebRTC can be complex, but following best practices can help you optimize your implementation and achieve better performance. Some tips for working with SDP in WebRTC include:

Keep SDP messages as small as possible: Large SDP messages can slow down the negotiation process and reduce overall performance. Keep your SDP messages as small as possible by only including the necessary attributes.
Use a signaling server: A signaling server can help mediate the SDP negotiation process and ensure that both devices agree on the same media parameters. A signaling server can also help ensure your WebRTC implementation is secure.
Test your implementation across multiple devices and networks: Testing your WebRTC implementation across numerous devices and networks can help ensure that it is interoperable and can work in various environments.
Use a library or framework: A WebRTC library or framework can help simplify the SDP negotiation process and reduce the risk of errors.

These best practices can help you build a more reliable and performant WebRTC implementation.

Bonus content

PSA: wildcard `rtcp-fb` is coming to Chromium M114 (planned)

We talked about rtcp-fb above in the attributes section of this post. There has been a recent change on WebRTC on chromium source related to rtcp-fb which was announced on the Google group of WebRTC. This is not a new feature and the specs were lying on RFC4585 for a long time. The spec says

A wildcard payload type ("*") MAY be used to indicate that the RTCP feedback attribute applies to all payload types.

This might not sound important, but can lead to an immense reduction in the size of an SDP, in some cases almost reducing the size by half! Here is an example:

Before, we would need to specify an rtcp-fb line corresponding to each codec and feedback supported, for example:

a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli

a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli

a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli

a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli

a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli

.... you get the idea, this happens for every codec identifier

Now, instead of targeting each codec with rtcp-fb, you can target all of them at once, by passing *:

a=rtcp-fb:* goog-remb
a=rtcp-fb:* transport-cc
a=rtcp-fb:* ccm fir
a=rtcp-fb:* nack
a=rtcp-fb:* nack pli

Smaller SDPs mean lesser bytes to transfer over your signaling layer! Stay tuned for this change to come out in a future Chrome release.

Conclusion

SDP is a crucial component of WebRTC, enabling real-time communication between devices that may have different capabilities or are located behind firewalls or NATs. Understanding how SDP works in WebRTC and following best practices can help you build a more reliable and performant WebRTC implementation.

By following the tips and best practices outlined in this article, you can optimize your WebRTC implementation and ensure that it works across a variety of browsers and platforms.

I hope you found this post informative and engaging. If you have any thoughts or feedback, please get in touch with me on Twitter or LinkedIn. Stay tuned for more related blog posts in the future!