Introduction to Web Audio API

A critical part of WebRTC is the transmission of audio. Web Audio API is all about processing and synthesizing audio in web applications. It allows developers to create complex audio processing and synthesis using a set of high-level JavaScript objects and functions. The API can be used to create a wide range of audio applications, such as music and sound effects in games, interactive audio in virtual reality, and more.

Let us take a look at various concepts behind Web Audio API.

Capture and playback audio

Web Audio API provides several ways to capture and playback audio in web applications.

Here's an example of how to capture audio using the MediaStream API and play it back using the Web Audio API:

First, we need to request permission to access the user's microphone by calling navigator.mediaDevices.getUserMedia().

navigator.mediaDevices.getUserMedia({audio: true})
  .then(stream => {
    // The stream variable contains the audio track from the microphone
  })
  .catch(err => {
    console.log('Error getting microphone', err);
  });

Next, we create an instance of the Web Audio API's AudioContext object. We can then create a MediaStreamAudioSourceNode by passing the MediaStream object to the audioCtx.createMediaStreamSource() method.

const audioCtx = new AudioContext();
const source = audioCtx.createMediaStreamSource(stream);

Once we have the source, we can then connect the source node to the audio context's destination node to play the audio.

source.connect(audioCtx.destination);

Now when we call start() method on the audio context, it will start capturing audio from the microphone and playing it back through the speakers.

audioCtx.start();

Autoplay

Browsers handle web audio autoplay in different ways, but in general, they have implemented policies to prevent unwanted audio from playing automatically. This is to protect users from being surprised by unwanted audio, and to prevent abuse of the autoplay feature.

Chrome, Edge, Firefox and Safari have implemented a "muted autoplay" policy, which allows autoplay of audio only if the audio is muted, or if the user has previously interacted with the website.
Safari goes further by requiring user interaction (click) before allowing audio to play.
Firefox has the option to set audio autoplay with sound disabled by default and the user needs to interact with the website to allow audio playback.

Developers can use the play() method to initiate audio playback. This method will only work if the user has interacted with the website and if the audio is not set to autoplay.

Also, the Web Audio API provides the AudioContext.resume() method, which can be used to resume audio playback after it has been suspended by the browser. This method is useful for situations where the user has interacted with the website, but the audio has been suspended due to a lack of user interaction.

Overall, to ensure that web audio autoplay works as expected, it's important to understand the different browsers' policies and provide a clear user interface that allows users to control audio playback.

WebRTC call quirks:
Other than the autoplay restriction listed above, there are a few specific quirks associated with Web Audio when using it in WebRTC calls.

Safari will not let you create new <audio> tags when the tab is in background, so when a new participant joins your meeting you can not create a new audio tag.
WebRTC Echo cancellation does not work with AudioContext API on Chromium.
You can create one <audio> tag and add all AudioTracks to a common stream, but every time you add a new track.
In Safari, you have to call play() again.
In Chromium, you have to set srcObject again.

Codecs

The Web Audio API is designed to work with a variety of audio codecs. Some of the most common codecs that are supported by web browsers include:

PCM: Pulse-code modulation (PCM) is a digital representation of an analog audio signal. It is a lossless codec, which means that it does not lose any audio quality during compression. PCM is the most basic and widely supported audio codec on the web.
MP3: MPEG-1 Audio Layer 3 (MP3) is a widely used lossy audio codec that is known for its high compression ratio and good audio quality. It is supported by most web browsers, but is not supported by some of the more recent ones.
AAC: Advanced Audio Coding (AAC) is a lossy audio codec that is known for its high audio quality and low bitrate. It is supported by most web browsers, but not all.
Opus: Opus is a lossy codec that is designed for low-latency, high-quality, and low-bitrate audio, it's designed to work well on the internet, it is supported by all modern browsers.
WAV: Waveform Audio File Format (WAV) is a lossless audio codec that is widely supported by web browsers. It is commonly used for storing high-quality audio files, but it has a larger file size than other codecs.
Ogg: Ogg is an open-source container format for digital multimedia, it's supported by most web browsers and it's often used for Vorbis codec.
Vorbis: Vorbis is an open source and patent-free lossy audio codec that is known for its high audio quality and low bitrate. It is supported by most web browsers, but not all.

By using the codecs that are widely supported by web browsers will ensure that the audio content can be played on a different devices and platforms.

Permissions

To handle various web audio permissions issues, you can use the Permission API and the MediaDevices.getUserMedia() method to request permission to access the microphone or camera.

Here's an example of how to request microphone permission and handle the various permission states:

navigator.permissions.query({name:'microphone'}).then(function(permissionStatus) {
    permissionStatus.onchange = function() {
        if (permissionStatus.state === 'granted') {
            // Access to microphone granted
            // create an audio context and access microphone
        } else if (permissionStatus.state === 'denied') {
            // Access to microphone denied
            // handle denied permission
        }
    };
});

For the MediaDevices.getUserMedia() method, you can use the catch method to handle errors and implement fallbacks:

navigator.mediaDevices.getUserMedia({ audio: true })
    .then(function(stream) {
        // Access to microphone granted
        // create an audio context and access microphone
    })
    .catch(function(error) {
        console.log('Error occurred:', error);
        // handle denied permission or other errors
    });

You can also check for the browser support for the navigator.permissions.query() and navigator.mediaDevices.getUserMedia() before calling them.

In addition to handling permission issues, it's important to provide clear instructions to users on how to grant permission and to make sure that the website's functionality doesn't break if permission is denied or if the Web Audio API is not supported by the browser.

Audio processing

Audio processing is the manipulation of audio signals using Signal processing. It is used in a wide range of applications such as music production, audio effects, noise reduction, speech processing, and more.

There are two types of processing that we can do on audio, Frequency based and Time based.

We can add processing nodes to the audio processing graph, such as a gain node to control the volume or a filter node to change the frequency response of the audio.

const gainNode = audioCtx.createGain();
source.connect(gainNode);
gainNode.connect(audioCtx.destination);

We will cover more specific audio processing use cases in the future.

Examples

Here are a few examples of the Web Audio API use cases:

Voice chat and Conferencing

Web Audio API allows you to capture audio from a user's microphone and process it in real-time. This can be used to build voice chat and conferencing applications like Dyte that run directly in the browser.

Voice Recognition

Web Audio API can be used to process audio input from a user's microphone and analyze it to recognize speech. This can be used to create voice-controlled interfaces for web applications.

Visualizations

Web Audio API can be used to generate data from audio input, this data can be used to create various visualizations. For example, a music player application could use the Web Audio API to generate a visualization of the frequency spectrum of the currently playing song.

Music and Sound effects in games

Web Audio API can be used to create interactive audio experiences in browser-based games. Developers can use the API to play background music, sound effects, and even generate audio on the fly based on game events.

Music and Audio editing

Web Audio API provides a powerful set of tools for manipulating audio, including filtering, mixing, and processing. This allows developers to create web-based audio editing tools that can be used to record, edit, and export audio.

Conclusion

We covered the basics of Web Audio transmission and concepts around it in case of WebRTC in this blog post. There is more to catch up on this topic and we will post it in coming weeks. Stay tuned.

If you haven’t heard about Dyte yet, head over to https://dyte.io to learn how we are revolutionizing live video calling through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. If you have any questions, you can reach us at support@dyte.io or ask our developer community.