Dyte

Bridging Your Telephony Calls to a Video Call

Palash Golecha — Tue, 03 Dec 2024 08:48:00 GMT

This blog post offers an in-depth look at the world of telecommunication technologies, focusing on how traditional telephony and modern video conferencing are being integrated together. We will clarify complex concepts like VOIP, SIP, PSTN, and WebRTC, examining their interactions and the methods used to combine these technologies.

Bridging telephony calls with video calls is not just a technical feat; it serves practical and impactful use cases in various sectors. Here are some reasons why this integration is crucial:

Telehealth Services: In healthcare, integrating telephony with video conferencing enables doctors to offer more comprehensive telehealth services. Patients can initially contact healthcare providers through regular phone calls and, if necessary, switch to a video call for a more thorough consultation.
Network Restrictions: Let’s be honest, not every place has great internet, especially mobile internet, which is far more unstable indoors or while on the move. Network restrictions shouldn’t block your customers from accessing your services.
Customer Service: Customer support centers can benefit significantly from this integration. A customer calling through a traditional phone line can be quickly transferred to a WebRTC voice call with a support agent, allowing for more personalized and effective problem-solving.

Now, let’s get to understand what technologies we are dealing with to make all of this possible

PSTN (Public Switched Telephone Network):

PSTN is the traditional telephone network that has been in use for many years. It is a circuit-switched network that relies on copper wires, fiber optic cables, microwave transmission links, satellites, and undersea telephone cables

Most of your regular phone calls use PSTN to connect with each other

VOIP

VoIP (Voice over Internet Protocol) is a technology that allows voice communication to be transmitted over the Internet or other IP-based networks. It represents a significant shift from traditional telephony, which relies on circuit-switched networks such as the Public Switched Telephone Network (PSTN).

VoIP is an umbrella term that encompasses many different protocols for voice communications over the Internet, including voice calls

VoIP service providers typically use gateways to connect VoIP networks with the PSTN, allowing seamless communication between internet-based and traditional telephone systems.
The integration of VoIP with PSTN enables users to enjoy the benefits of internet-based calling while still being able to connect with users on the conventional telephone network

WebRTC

This is the technology that is commonly used for video and audio communication on the web. WebRTC is an umbrella of protocols that enables web browsers and mobile applications to communicate directly with each other in real-time without the need for intermediaries.

Everyone from Conferencing applications like Google Meet to SDK vendors like Dyte uses WebRTC to deliver real-time audio and video on the internet

Google Meet is popular, we are aware of it, but its latency and connectivity issues just doesn't work for modern teams - so here's a list of Google Meet alternatives.

SIP and SIP Trunking

SIP, or Session Initiation Protocol, is a set of rules governing the initiation, maintenance, and termination of VoIP calls. It acts as a supporting protocol that facilitates the functioning of VoIP technology i.e. it is a part of the VOIP protocols

Bridging WebRTC with Telephony

Session Initiation Protocol (SIP) Interconnect refers to the setup where two or more different SIP-based networks or systems are connected to enable the flow of voice traffic between them.

Dyte's SIP Interconnect allows you to bridge VOIP calls from an external third-party service to Dyte's WebRTC meetings. That means you can use SIP methodologies to connect with our SIP Servers and have it bridged to participants who might be connected through Dyte Client SDKs (WebRTC)

Integration Guide

Get your SIP credentials from the Developer Portal in the API Keys section
(You will have to contact support to enable the feature)

Our SIP server provides unique SIP user information with credentials. With this information, an SIP URL can be formed to connect to a meeting. The convention is sip:@sip.dyte.io.

Once you have the credentials, the simplest way to test the SIP Endpoint is using a SIP Client, you can use clients like Zoiper, Telephone(Mac only), etc.

Now, to connect to a specific Dyte meetingId, you can dial in using SIP with the given username and password and a URI in the format sip:@sip.dyte.io

🎉 That is it, once you dial with the above credentials your SIP call should be bridged with Dyte's WebRTC meeting.

Examples

Integration with Twilio Voice

To connect with Dyte, we are going to use TwiML to perform the SIP dialin.

Guide

Steps to follow

Get a Twilio account. You can go to https://www.twilio.com/try-twilio and create an account
Buy a VOIP number
Configure the VOIP number to use webhook to handle any incoming call

Now, when you get a webhook, you can respond with a TwiML SIP Dial verb with Dyte's SIP configuration




    sip:meetingId@sip.dyte.io

Express Example

const express = require('express');
const VoiceResponse = require('twilio').twiml.VoiceResponse;
const urlencoded = require('body-parser').urlencoded;

const app = express();

// Parse incoming POST params with Express middleware
app.use(urlencoded({ extended: false }));

// Create a route that will handle Twilio webhook requests, sent as an
// HTTP POST to /voice in our application
app.post('/voice', (request, response) => {
  console.log({ request });
// Use the Twilio Node.js SDK to build an XML response
  const twiml = new VoiceResponse();

  const dial = twiml.dial();
  dial.sip(
    {
      username: '',
      password: '',
    },
    'sip:@sip.dyte.io'
  );

// Render the response as XML in reply to the webhook request
  response.type('text/xml');
  console.log({ twiml: twiml.toString() });
  response.send(twiml.toString());
});

In conclusion, integrating telephony with video conferencing represents a is a practical development in communication technology, combining the reliability and widespread use of PSTN with the flexibility and modern capabilities of VoIP, WebRTC, and SIP. By bridging these technologies, services like Dyte are enabling more seamless and versatile communication experiences across various sectors, from telehealth to customer service.

This blend of old and new technology not only enhances existing communication methods but also opens the door for innovative applications in the future.

Hiring Challenge: Smallest Golang Websocket Client

Pratham K — Fri, 29 Nov 2024 16:00:00 GMT

Adventures in making small Go binaries

In this post, we'll write a small Go program to talk with a websocket server while trying to make the generated binary as small as possible. This was performed as part of one of Dyte's hiring challenges, but the methods discussed here can be applied to any Go program in general. Do note that this is just for fun and not something you should try in production!

Problem statement

So, we have a basic websocket server that accepts connections from a client and checks if it sent a hello message, whereas the client has to print out the server's response. The server-side code is written using the gorilla/websocket package and can be found here, but we won't really go through it here as our focus is on making the client-side binary small.

We'll be covering various methods throughout this post, ranging from swapping out the Go compiler, using an ELF packer, and tweaking linker flags to using raw syscalls instead of the standard library.

Humble Beginnings

Let's start out by writing an obvious Go program using the x/net/websocket package:

package main

import (
	"fmt"
	"log"

	"golang.org/x/net/websocket"
)

func main() {
	url := "ws://localhost:8080/"
	ws, err := websocket.Dial(url, "", url)

	if err != nil {
		log.Fatal(err)
	}

	defer ws.Close()

	// Write the `hello` message
	if _, err := ws.Write([]byte("hello")); err != nil {
		log.Fatal(err)
	}

	// 512 byte buffer for storing the response
	var response = make([]byte, 512)

	// No. of bytes received
	var received int

	if received, err = ws.Read(response); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Received: %s\n", response[:received])
}

Building and running it, we get a ~5.8 MiB binary (6084899 bytes), which is far from our goal:

$ go build -o main && ./main
Received: dyte
$ wc -c main
6084899 main

The go build command allows us to tweak the flags passed to various components like the assembler ( go tool asm, -asmflags), the linker ( go tool link, -ldflags) and the compiler itself ( go tool compile, -gcflags). But only the linker flags are relevant to us for reducing the binary size, and this is quite widely known. In ldflags, -s disables the symbol table and -w omits debug information, while the -trimpath flag converts absolute file paths to relative ones, further reducing the size to ~3.9 MiB:

$ go build -trimpath -ldflags '-s -w' -o main && wc -c main
4128768 main

Reinventing the wheel

Now, we'll start moving into the more esoteric side of things while still sticking with our trusty Go compiler. For starters, let's abandon the net/websocket package and talk over the TCP socket directly, crafting the HTTP and websocket payload by hand.

Refer to this MDN document about writing websocket servers, as we won't be covering the payload in-depth here, though it is extensively commented on in the code below:

package main

import (
	"fmt"
	"log"
	"net"
)

func main() {
	httpInitMsg := []byte("GET / HTTP/1.1\r\nHost:dyte.io\r\nUpgrade:websocket\r\nConnection:Upgrade\r\nSec-WebSocket-Key:dGhlIHNhbXBsZSBub25jZQ==\r\nSec-WebSocket-Version:13\r\nConnection:Upgrade\r\n\r\n")
	wsPayload := []byte{
		// FIN Bit (Final fragment), OpCode (1 for text payload)
		0b10000001,
		// Mask Bit (Required), followed by 7 bits for length (0b0000101 == 5)
		0b10000101,
		// We don't set the extended payload bits as our payload is only 5 bytes
		// Mask (can be any arbritary 32 bit integer)
		0b00000001,
		0b00000010,
		0b00000011,
		0b00000100,
		// Payload, the string "hello" with each character XOR'd with the
		// corresponding mask bits
		0b01101001, // 'h' ^ 0b00000001
		0b01100111, // 'e' ^ 0b00000010
		0b01101111, // 'l' ^ 0b00000011
		0b01101000, // 'l' ^ 0b00000100
		0b01101110, // 'o' ^ 0b00000001
	}

	// Establish a TCP connection to the server
	conn, err := net.Dial("tcp", "localhost:8080")

	if err != nil {
		log.Fatal(err)
	}

	defer conn.Close()

	// Send the initial HTTP message to start talking over the WebSocket protocol
	_, err = conn.Write(httpInitMsg)

	if err != nil {
		log.Fatal(err)
	}

	response := make([]byte, 512)

	// Receive the initial HTTP response
	received, err := conn.Read(response)

	if err != nil {
		log.Fatal(err)
	}

	// Write the websocket frame
	_, err = conn.Write(wsPayload)

	if err != nil {
		log.Fatal(err)
	}

	// Read the reply into the existing buffer
	_, err = conn.Read(response[received:])

	fmt.Println(string(response))
}

We've made quite some progress, down to ~1.7 MiB!

$ go build -trimpath -ldflags '-s -w' -o main && ./main
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

dyte
$ wc -c main
1814528 main

Now, we'll use UPX, an executable packer that compresses the binary and strips unneeded ELF sections. Do note that this impacts cold start times a bit due to the decompression overhead. This takes us down to ~710 KiB!

$ upx -9 main # Max compression level
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2024
UPX 4.2.2       Markus Oberhumer, Laszlo Molnar & John Reiser    Jan 3rd 2024

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
   1814528 ->    727684   40.10%   linux/amd64   main

Packed 1 file.

One step closer to insanity

Till now, we've just switched to the standard library for talking to the server. We can go one step further and use raw syscalls to handle all the socket interactions, becoming our own standard library in a sense :p

Note that syscalls are a lower level of abstraction than libc, as the libc functions, such as recv internally wrap the corresponding system calls. This might not make much sense if you've never done socket programming in C, but the comments should give you enough of an idea of what's going on.

Essentially, a socket is a file descriptor that we create via the socket() syscall (which is identified by SYS_SOCKET here, referring to syscall no. 41), and we further use this in subsequent syscalls to connect to the server and exchange data. The sockaddr_in structure is used to describe the address & port we want to connect to, which we encode by hand here:

func main() {
	httpInitMsg := []byte(...)
	wsPayload := []byte{...}
	// Connects to an IPv4 server at 127.0.0.1 on port 8080
	sockaddr := []byte{
		// family - AF_INET (0x2), padded to 16 bits
		0b00000010,
		0b00000000,
		// port - 8080, padded to 16 bits
		0b00011111,
		0b10010000,
		// addr - 127.0.0.1, 32 bits
		// 127 << 0 | 0 << 8 | 0 << 16 | 1 << 24
		0b01111111,
		0b00000000,
		0b00000000,
		0b00000001,
		// 64 bits of padding
		0b00000000, 0b00000000, 0b00000000, 0b00000000,
		0b00000000, 0b00000000, 0b00000000, 0b00000000,
	}
	// The response buffer for receiving server responses
	var response [135]byte

	// Create a IPv4 (AF_INET), TCP (SOCK_STREAM) socket FD
	// __NR_socket, AF_INET, SOCK_STREAM
	var sock, _, _ = syscall.Syscall(syscall.SYS_SOCKET, 0x2, 0x1, 0)

	// Connect to the server using the `sockaddr_in` structure
	// __NR_connect, fd, sockaddr_in, len(sockaddr_in)
	syscall.Syscall6(syscall.SYS_CONNECT, sock, uintptr(unsafe.Pointer(&sockaddr[0])), uintptr(len(sockaddr)), 0, 0, 0)

	// Send the HTTP message over the socket
	// __NR_sendto, fd, buf, len(buf), flags, addr, addr_len
	syscall.Syscall6(syscall.SYS_SENDTO, sock, uintptr(unsafe.Pointer(&httpInitMsg[0])), uintptr(len(httpInitMsg)), 0, 0, 0)

	// Receive the response
	// __NR_recvfrom, fd, buf, len(buf), flags, addr, addr_len
	var n, _, _ = syscall.Syscall6(syscall.SYS_RECVFROM, sock, uintptr(unsafe.Pointer(&response[0])), uintptr(len(response)), 0, 0, 0)

	// Send the WebSocket frame
	// __NR_sendto
	syscall.Syscall6(syscall.SYS_SENDTO, sock, uintptr(unsafe.Pointer(&wsPayload[0])), uintptr(len(wsPayload)), 0, 0, 0)

	// Receive the response
	// __NR_recvfrom
	syscall.Syscall6(syscall.SYS_RECVFROM, sock, uintptr(unsafe.Pointer(&response[n])), uintptr(len(response))-n, 0, 0, 0)

	// Close the socket FD
	// __NR_close
	syscall.Syscall(syscall.SYS_CLOSE, sock, 0, 0)

	// Write the response string to standard output
	// __NR_write, STDOUT_FILENO
	syscall.Syscall(syscall.SYS_WRITE, 1, uintptr(unsafe.Pointer(&response[0])), uintptr(len(response)))
}

Now, the stock binary is ~828 KiB, and with UPX, it goes down to a not-so measly ~352 KiB:

$ go build -trimpath -ldflags '-s -w' -o main && upx -9 main
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2024
UPX 4.2.2       Markus Oberhumer, Laszlo Molnar & John Reiser    Jan 3rd 2024

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
    847872 ->    360692   42.54%   linux/amd64   main

Packed 1 file.

Swapping the Go compiler

Unfortunately, that's a dead-end for how far the vanilla Go compiler can take us. We can now start experimenting with TinyGo, an alternative LLVM-based Go compiler that produces significantly smaller binaries. Spoiler alert: Our binaries will fall below the minimum size accepted by UPX for compression!

One nifty feature TinyGo provides is the -size flag, which shows the various packages that make up our final binary. This is what it shows in the 2nd example that uses the standard library's net package:

$ tinygo build -o main -size full
   code  rodata    data     bss |   flash     ram | package
------------------------------- | --------------- | -------
      0      45       4      18 |      49      22 | (padding)
     45   20330      18      78 |   20393      96 | (unknown)
   1647    3627       0      72 |    5274      72 | /usr/lib/go/src/syscall
   3870      58      12     536 |    3940     548 | C musl
    401       0       0       0 |     401       0 | Go interface assert
    477       0       0       0 |     477       0 | Go interface method
      0    5816       0       0 |    5816       0 | Go types
     50       0       0       0 |      50       0 | errors
   7780     161      40       0 |    7981      40 | fmt
     26       0       0       0 |      26       0 | internal/bytealg
   1690      21       0       0 |    1711       0 | internal/fmtsort
    443      51       0      48 |     494      48 | internal/godebug
    157     369    1280       0 |    1806    1280 | internal/godebugs
     31      12      48      88 |      91     136 | internal/intern
    155       2       0       0 |     157       0 | internal/itoa
      0      57      48       0 |     105      48 | internal/oserror
    486      24       0      16 |     510      16 | internal/task
    336      22       0       0 |     358       0 | io/fs
   2767       3      40      64 |    2810     104 | log
    127       0       0       0 |     127       0 | main
     27       0       0       0 |      27       0 | math
    122       0       0       0 |     122       0 | math/bits
      0      25      16     160 |      41     176 | net
    298      16      56      24 |     370      80 | os
   6272     715      96       0 |    7083      96 | reflect
   8993     258      12      95 |    9263     107 | runtime
    822       0       0       0 |     822       0 | sort
   7280   16705    1338       0 |   25323    1338 | strconv
   1822     200       0       0 |    2022       0 | sync
    141      75       0       1 |     216       1 | sync/atomic
    193    1455       0       8 |    1648       8 | syscall
  20700    1029     184     128 |   21913     312 | time
   1132     288       0       0 |    1420       0 | unicode/utf8
------------------------------- | --------------- | -------
  68290   51364    3192    1336 |  122846    4528 | total

Meanwhile, for the syscalls-only example:

$ tinygo build -o main -size full
   code  rodata    data     bss |   flash     ram | package
------------------------------- | --------------- | -------
      0       1       4      21 |       5      25 | (padding)
     25    2494       8      31 |    2527      39 | (unknown)
     92       0       0      40 |      92      40 | /usr/lib/go/src/syscall
   2894      27       4     536 |    2925     540 | C musl
      0     208       0       0 |     208       0 | Go types
    365      24       0      16 |     389      16 | internal/task
    268     162       0       0 |     430       0 | main
   3020     135       8      91 |    3163      99 | runtime
     80      75       0       1 |     155       1 | sync/atomic
------------------------------- | --------------- | -------
   6744    3126      24     736 |    9894     760 | total

This makes sense as we skip over a ton of abstractions by using raw syscalls, but we can reduce this even further with the control that TinyGo gives us! We can disable goroutines & channels, swap out the GC, pass arbritary linker flags, etc. as can be seen in the documentation.

First off, let's get a baseline for how much TinyGo can help us:

$ tinygo build -o main -no-debug && wc -c main
18160 main

17.7 KiB, that's already 1/20th the size of our previous attempt! Let's go ahead and disable goroutines ( with -scheduler none), switch to a smaller GC implementation that just leaks memory ( -gc leaking), and just execute a trap instruction instead of printing the panic message in-case of panics ( -panic trap) - 12.75 KiB:

$ tinygo build -o main -no-debug -scheduler none -gc leaking -panic trap && wc -c main
13056 main

Ripping out the GC

The leaking GC, while quite small, still includes code to request memory via syscalls, so we can just provide our own allocator that gives out addresses from a fixed-size buffer on the stack, which is initialized at program startup. We can use a small buffer for this purpose as only a few allocations are made in our program, such as initializing the variables we declared (due to Go's escape analysis, as we take pointers to these variables) and the runtime startup code:

--- a/main.go
+++ b/main.go
@@ -5,6 +5,26 @@ import (
        "unsafe"
 )

+var buffer [1024]byte
+var used uintptr = 0
++// We disable the go GC entirely and provide this stub for handling
+// allocations, giving out addresses from a static buffer on the stack
+// This saves many bytes over using the "leaking" GC, it is more or less
+// used exclusively by the runtime's startup code for tasks like setting up
+// the processe's environment variables
+// If it crashes, run it with a clean environment (env -i ./main)
++//go:linkname alloc runtime.alloc
+func alloc(size uintptr, layoutPtr unsafe.Pointer) unsafe.Pointer {
+       var ptr = unsafe.Pointer(&buffer[used])
++       // Align for x64
+       used += ((size + 15) &^ 15)
++       return ptr
+}
+
 func main() {

Now, building with -gc none - 12.42 KiB:

$ tinygo build -o main -no-debug -scheduler none -gc none -panic trap && wc -c main
12720 main

Linker flags

As mentioned before, TinyGo allows us to pass arbitrary flags to the linker at compile time. This can be done via spec files, which tell TinyGo some information about the target architecture; some examples can be seen here. The format is not documented as such in the documentation, but all the possible keys with their defaults can be found in target.go, which we use as a reference for creating our own.

This is what our spec.json looks like all the values are at their defaults except for ldflags, which we will now go through:

{
  "llvm-target": "x86_64-unknown-linux-musl",
  "cpu": "x86-64",
  "goos": "linux",
  "goarch": "amd64",
  "build-tags": [
    "amd64",
    "linux"
  ],
  "linker": "ld.lld",
  "rtlib": "compiler-rt",
  "libc": "musl",
  "defaultstacksize": 65536,
  "ldflags": [
    "--gc-sections",
    "--discard-all",
    "--strip-all",
    "--no-rosegment",
    "-znorelro",
    "-znognustack"
  ]
}

Let's refer to the lld linker's man-page for these flags:

-gc-sections: Enables garbage collection of unused sections, explained more in detail in this blog
-discard-all: Deletes all local symbols
-strip-all: Removes the symbol table and debug information
-no-rosegment: Allows the linker to combine read-only and read-execute segments of the binary
znorelro: Disables emitting the PT_GNU_RELRO segment, used to specify certain regions of the binary that should be marked as read-only after performing relocations. Good security measure, but we just care about trimming bytes in this post :p
znognustack: Disables emitting the PT_GNU_STACK segment, used to determine whether the stack should be executable or not, again, security

On top of this, we can further strip more sections from the compiled binary with the strip command strip --strip-section-headers -R .comment -R .note -R .eh_frame main. This removes the section headers (used by tools like objdump to locate sections), along with the .comment section (which contains toolchain-related info) and the .eh_frame section (used for stack unwinding, which we don't need here)

Finally, our binary is down to 6.44 KiB:

$ tinygo build -o main -no-debug -scheduler none -gc none -panic trap -target spec.json
$ strip --strip-section-headers -R .comment -R .note -R .eh_frame main
$ wc -c main
6600 main

Ripping out the standard library

6.44 KiB is still too big for a program that basically just makes a few syscalls (technically, every program fits this definition, but you get the intent), and this part gets its own section as it is basically cheating in the context of this challenge :p

So, we're still pulling in quite a bit of code from the standard library, mainly around the startup code that sets up the program's execution environment before our main function is actually called, look at runtime_unix.go and scheduler_none.go for more clarity.

All we have to do is export our main function with a different name (eg. smol_main), and tell the linker to treat that as the actual entry point, which would prevent the standard library startup code from making its way into our binary.

In spec.json, we pass the entry flag to the linker, and drop libc completely, as it is only needed by TinyGo's standard library for certain functions

--- a/spec.json
+++ b/spec.json
@@ -9,7 +9,6 @@
   ],
   "linker": "ld.lld",
   "rtlib": "compiler-rt",
-  "libc": "musl",
   "defaultstacksize": 65536,
   "ldflags": [
     "--gc-sections",
@@ -17,6 +16,7 @@
     "--strip-all",
     "--no-rosegment",
     "-znorelro",
-    "-znognustack"
+    "-znognustack",
+    "-entry=smol_main"
   ]
 }

In main.go, we make our local variables global, allowing them to be placed on the stack rather than the heap (remember the escape analysis mentioned earlier?), which further allows us to get rid of our dummy GC implementation. We annotate the main function with directives to export it as smol_main, and disable bounds checking, as the panic handler for it indirectly pulls in some libc symbols.

--- a/main.go
+++ b/main.go
@@ -5,29 +5,9 @@ import (
        "unsafe"
 )

-var buffer [1024]byte
-var used uintptr = 0
--// We disable the go GC entirely and provide this stub for handling
-// allocations, giving out addresses from a static buffer on the stack
-// This saves many bytes over using the "leaking" GC, it is more or less
-// used exclusively by the runtime's startup code for tasks like setting up
-// the processe's environment variables
-// If it crashes, run it with a clean environment (env -i ./main)
--//go:linkname alloc runtime.alloc
-func alloc(size uintptr, layoutPtr unsafe.Pointer) unsafe.Pointer {
-       var ptr = unsafe.Pointer(&buffer[used])
--       // Align for x64
-       used += ((size + 15) &^ 15)
--       return ptr
-}
--func main() {
-       httpInitMsg := []byte("GET / HTTP/1.1\r\nHost:dyte.io\r\nUpgrade:websocket\r\nConnection:Upgrade\r\nSec-WebSocket-Key:dGhlIHNhbXBsZSBub25jZQ==\r\nSec-WebSocket-Version:13\r\nConnection:Upgrade\r\n\r\n")
-       wsPayload := []byte{
+var (
+       httpInitMsg = []byte("GET / HTTP/1.1\r\nHost:dyte.io\r\nUpgrade:websocket\r\nConnection:Upgrade\r\nSec-WebSocket-Key:dGhlIHNhbXBsZSBub25jZQ==\r\nSec-WebSocket-Version:13\r\nConnection:Upgrade\r\n\r\n")
+       wsPayload   = []byte{
                // FIN Bit (Final fragment), OpCode (1 for text payload)
                0b10000001,
                // Mask Bit (Required), followed by 7 bits for length (0b0000101 == 5)
@@ -47,7 +27,7 @@ func main() {
                0b01101110, // 'o' ^ 0b00000001
        }
        // Connects to an IPv4 server at 127.0.0.1 on port 8080
-       sockaddr := []byte{
+       sockaddr = []byte{
                // family - AF_INET (0x2), padded to 16 bits
                0b00000010,
                0b00000000,
@@ -65,8 +45,12 @@ func main() {
                0b00000000, 0b00000000, 0b00000000, 0b00000000,
        }
        // The response buffer for receiving server responses
-       var response [135]byte
+       response [135]byte
+)

+//export smol_main
+//go:nobounds
+func main() {
        // Create a IPv4 (AF_INET), TCP (SOCK_STREAM) socket FD
        // __NR_socket, AF_INET, SOCK_STREAM
        var sock, _, _ = syscall.Syscall(syscall.SYS_SOCKET, 0x2, 0x1, 0)
@@ -98,4 +82,11 @@ func main() {
        // Write the response string to standard output
        // __NR_write, STDOUT_FILENO
        syscall.Syscall(syscall.SYS_WRITE, 1, uintptr(unsafe.Pointer(&response[0])), uintptr(len(response)))
++       // Cleanly exit the program with status code 0
+       // The libc does this for us in the usual flow, that goes like so:
+       //   __libc_start_main (libc) -> main (runtime_unix.go) -> main (main.go)
+       // But here, the entrypoint is in main.go itself
+       // __NR_exit, EXIT_SUCCESS
+       syscall.Syscall(syscall.SYS_EXIT, 0, 0, 0)
 }

Now, we're down to just 810 bytes:

$ tinygo build -o main -scheduler none -gc none -panic trap -target spec.json \
    && strip --strip-section-headers -R .comment -R .note -R .eh_frame main \
    && wc -c main
810 main

Compiling for 32-bits

One last trick up our sleeves is to compile the binary for 32-bits (i386) rather than amd64, as 32-bit binaries are significantly smaller in comparison. However, we'll still be able to run this binary on most 64-bit Linux systems (given that CONFIG_IA32_EMULATION is enabled in the kernel)

To do this, all we need to do is flip the target-related switches in spec.json. Note that we don't need to update syscalls to reflect i386 as we're using constants like syscall.SYS_SOCKET rather than hardcoding the syscall numbers:

spec.json

--- a/spec.json
+++ b/spec.json
@@ -1,10 +1,10 @@
 {
-  "llvm-target": "x86_64-unknown-linux-musl",
-  "cpu": "x86-64",
+  "llvm-target": "i386-unknown-linux-musl",
+  "cpu": "i386",
   "goos": "linux",
-  "goarch": "amd64",
+  "goarch": "386",
   "build-tags": [
-    "amd64",
+    "386",
     "linux"
   ],
   "linker": "ld.lld",

Now, our binary is just 538 bytes, and it still works!

$ tinygo build -o main -scheduler none -gc none -panic trap -target spec.json \
    && strip --strip-section-headers -R .comment -R .note -R .eh_frame main \
    && wc -c main
538 main
$ file main
main: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, no section header
$ ./main
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

dyte

Conclusion

Attempt	Size (in bytes)	Compiler
Using `x/net/websocket`	4128768 (stripped)	Go
Pure standard library	727684 (1814528 without UPX)	Go
Syscalls only	360692 (847872 without UPX)	Go
Syscalls only	13056	TinyGo
Syscalls with dummy GC	12720	TinyGo
Syscalls with dummy GC, custom ldflags	6600	TinyGo
Syscalls with no GC, custom ldflags, custom entrypoint	810	TinyGo
Syscalls with no GC, custom ldflags, custom entrypoint, 32-bit	538	TinyGo

As we did not cover each topic in a lot of depth in this post, here are some handy resources:

Checkout the full solution here https://github.com/git-bruh/wscodegolf/ and if you want to look at some of our other challenges, checkout https://hacktofinale.dyte.io/

Open Sourcing Dyte’s Device Emulator

Ravindra Singh Rathor — Tue, 26 Nov 2024 10:45:00 GMT

For a product, integration tests are one of the crucial parts that improve quality & stability. This is true for WebRTC applications as well. However, challenges arise when we try to create integration tests around user media in a browser.

For an end user, sharing a camera & mic is straightforward. For this, browsers expose APIs such as enumerateDevices & getUserMedia on the MediaDevices interface, on which user interfaces can be built easily.

navigator.mediaDevices.getUserMedia({audio: true, video: true})

Try pasting the above line in Developer Console in Chrome & you will see that it asks for permission from you to share the camera & mic. Once you permit the browser to access the camera & mic, you will see your camera LED turning on, indicating that your camera is in use.

Media Device Unavailability in Test Environments

In a virtualised test environment, access to actual media devices is often unavailable. The getUserMedia interface is designed to interact with real hardware connected to a device. Therefore, when these environments try to run tests that invoke such interfaces, they fail to replicate real-world scenarios.

In such a scenario one possible solution a developer can resort to is by using fake media streams interfaces provided by the browser

For example, you could start Chromium with the following command line arguments to add microphone and webcam devices which would use the provided static video & audio files as media source

--use-fake-ui-for-media-stream
--use-fake-device-for-media-stream
--use-file-for-fake-video-capture=ABSOLUTE_PATH_TO_VIDEO_FILE
--use-file-for-fake-audio-capture=ABSOLUTE_PATH_TO_AUDIO_FILE

This is better than having no media, but isn't really useful for testing

Limited Testing Capabilities

Even in the above situation or situations where physical hardware devices are available for testing, automating the test scenarios can be an uphill battle. Let's consider some instances where these challenges come into play:

Device Plug-in Scenario: Imagine a user plugs in a new microphone while a WebRTC application is running. An ideal application should seamlessly switch the audio input to the new device without requiring a manual switch or causing interruptions. Automating this test is challenging as it necessitates actual hardware manipulations, which are difficult to enact in a software-only test setup.
Hardware Failure Handling: Another crucial test case is how the application responds to hardware failures. For instance, what happens if a camera is unable to provide media in required constraints

Introducing Device Emulator

To solve all these pain points we are open-sourcing our device emulator - https://github.com/dyte-io/device-emulator that can be used to mimic devices across browsers. By simulating various hardware states and events, the toolkit can provide a nuanced and comprehensive evaluation of a WebRTC application's robustness, reliability, and user experience.

Dyte's device emulator currently supports:

Adding and removing virtual media devices
Simulating a failure by silencing a track
Simulating a faulty device (getUserMedia failure)

How easy is it to integrate the Dyte's device emulator, you might ask? In Playwright, the integration test solution that Dyte uses, all you have to do is add the below 1-liner ( or 3?) code snippet.

await page.addScriptTag({
      url: '',
});

Not using Playwright?

No worries. Just figure out a way to add a script tag to your choice of tool.

How does this work?

The script tag loads the device emulator library into your page which patches the navigator.MediaDevices interface with the toolkit's modified version providing the same API signatures. Internally it uses Web Audio API for generating virtual AudioTracks and Canvas API for generating VideoTracks and simulates different media behaviour and states

Adding a virtual device

Once the device emulator is loaded, Use the below code snippet. addEmulatedDevice is the extra method exposed by Dyte's device emulator to help you with the addition of devices.

window.addEventListener('dyte.deviceEmulatorLoaded', () => {
	navigator.mediaDevices.addEmulatedDevice('videoinput');
	navigator.mediaDevices.addEmulatedDevice('audioinput');
});

Removing a virtual device

It is a two-step process. Figure out the emulated device id using the code snippet below:

navigator.mediaDevices.enumerateDevices()

Filter the device that you want to use and retrieve the device Id. Once you have the device id, remove the device using the code snippet below.

navigator.mediaDevices.removeEmulatedDevice('PUT_EMULATED_DEVICE_ID_HERE');

That's it. Now, you can add as many devices as you want and play around with the addition/removal of devices.

Writing a Test

Using the above example, writing automated tests because straightforward

// Simulate a new camera plugging in
await this.page.evaluate(() => {
            navigator.mediaDevices.addEmulatedDevice('videoinput');
        });

// Verify your application's expected behaviour
await expect(..)

Quick Demo

Go to any web based video conferencing app like Google Meet or go to https://demo.dyte.io and create a meeting

Once inside the meeting, Open the developer console by right clicking on the webpage and then inspect.

Dev tools would open. Go to the Console tab and, paste the code below, and hit the enter/return key.

window.addEventListener('dyte.deviceEmulatorLoaded', () => {
     navigator.mediaDevices.addEmulatedDevice('videoinput');   
});

var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '';
document.head.appendChild(script);

It is that easy. Now, you can join a meeting with a fake video

Conclusion

Checkout the complete guide and examples at https://docs.dyte.io/community-packages/device-emulator

Or try an full demo at https://device-emulator.vercel.app/

Shipping a product with proper testing is paramount for any company, and what's better than writing integration tests to test end-user scenarios?

We know how tricky writing media-related integration tests could be! We hope this toolkit will ease some pain while writing media-related integration tests for you, as it did for us.

Feel free to raise feature requests, pull requests, or fork the repo (https://github.com/dyte-io/device-emulator) and customize it to your liking.

I hope you found this post informative and engaging. If you have any thoughts or feedback, please get in touch with me on Twitter or LinkedIn. Stay tuned for more related blog posts in the future!

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. If you have any questions, you can reach us at support@dyte.io or ask our developer community.

Building a Live Auction Platform With React and Dyte

Ishita Kabra — Wed, 20 Nov 2024 14:21:00 GMT

In today’s digital landscape, the art of buying and selling has taken a remarkable leap forward with the advent of real-time online marketplaces and auctions. A live auction is an interactive bidding process in real-time, facilitated through digital platforms. Participants place live bids on goods or services within a defined timeframe, and the highest bid at the end of the auction wins the item or service being auctioned.

We want to help people create engaging live experiences with Dyte, and live auctions just happened to be on the market. By blending technology with real-time bidding, in this blog, we’ll delve into the essentials and take you through the application flow of building and running your own live auction platform before the fall of the hammer.

Let’s discover the products up for sale before the bidding starts!

Why build a live auction platform?

Convenient: Participate in live auctions from anywhere, at any time.
Real-time bidding: Experience the thrill of bidding against other enthusiasts in real-time.
Wide range of items: Discover diverse items, from collectibles to artwork and more.
Seamless integration: The live auction app provides users with a smooth and uninterrupted auction experience with Dyte's reliable audio/video conferencing and customizable UI.

Before you start

Basic knowledge of React.js is required to build this application.
Please ensure that Node.js is installed on your machine. We will use it to run our application.
Lastly, you will need the API Key and organization ID from the developer portal to authenticate yourself when you call Dyte's REST APIs.

Building the live bidding platform

Installation

You need to install Dyte's React UI Kit and Core packages to get started. You can do so by using npm or Yarn.

npm install @dytesdk/react-ui-kit @dytesdk/react-web-core

Getting started

We will first fetch the organization ID and API Key from the developer portal. Then, we'll create an account and navigate to the API Keys page.

Please ensure that you do not upload your API Key anywhere.

Next, we will create a meeting using the following Rest API. Here is a sample response.

{
  "success": true,
  "data": {
    "id": "497f6eca-6276-4993-bfeb-53cbbbbaxxxx",
    "name": "string",
    "picture": "",
    "custom_participant_id": "string",
    "preset_name": "string",
    "created_at": "2019-08-24T14:15:22Z",
    "updated_at": "2019-08-24T14:15:22Z",
    "token": "string"
  }
}

We use the ID from the payload to generate an auth token using the Add Participants API. This auth token is used to initialize a Dyte client. Let's start setting up the project.

Build your custom UI

Create a file src/App.tsx. We will use the DyteMeeting hook to initialize a new Dyte meeting. The DyteProvider is used to pass the meeting object to all child components inside this application. We will also set up event listeners for joining and leaving the room.

import { useEffect, useState } from 'react';
import { DyteProvider, useDyteClient } from '@dytesdk/react-web-core';
import { LoadingScreen } from './pages';
import { Meeting, SetupScreen } from './pages';

function App() {
  const [meeting, initMeeting] = useDyteClient();
  const [roomJoined, setRoomJoined] = useState(false);

  useEffect(() => {
    const searchParams = new URL(window.location.href).searchParams;
    const authToken = searchParams.get('authToken');

    if (!authToken) {
      alert(
        "An authToken wasn't passed, please pass an authToken in the URL query to join a meeting."
      );
      return;
    }

    initMeeting({
      authToken,
      defaults: {
        audio: false,
        video: false,
      },
    });
  }, []);

  useEffect(() => {
    if (!meeting) return;

    const roomJoinedListener = () => {
      setRoomJoined(true);
    };
    const roomLeftListener = () => {
      setRoomJoined(false);
    };
    meeting.self.on('roomJoined', roomJoinedListener);
    meeting.self.on('roomLeft', roomLeftListener);

    return () => {
      meeting.self.removeListener('roomJoined', roomJoinedListener);
      meeting.self.removeListener('roomLeft', roomLeftListener);
    }

  }, [meeting])

  return (
    }>
      {
        !roomJoined ?  : 
      }
    
  )
}

Now, we will build a setup screen; this is the first page that the user will see.

Create a file src/pages/setupScreen/setupScreen.tsx. We will update the user's display name and join the Dyte meeting from this page.

import { useEffect, useState } from 'react'
import './setupScreen.css'
import { useDyteMeeting } from '@dytesdk/react-web-core';
import {
  DyteAudioVisualizer,
  DyteAvatar,
  DyteCameraToggle,
  DyteMicToggle,
  DyteNameTag,
  DyteParticipantTile,
} from '@dytesdk/react-ui-kit';

const SetupScreen = () => {
  const { meeting } = useDyteMeeting();
  const [isHost, setIsHost] = useState(false);
  const [name, setName] = useState('');

  useEffect(() => {
    if (!meeting) return;
    const preset = meeting.self.presetName;
    const name = meeting.self.name;
    setName(name);

    if (preset.includes('host')) {
      setIsHost(true);
    }
  }, [meeting])

  const joinMeeting = () => {
    meeting?.self.setName(name);
    meeting.joinRoom();
  }

  return (
    
      
        
          
            
            
              
            
            
              
               
              
            
          
        
      
      
        
          Welcome! {name}
          {isHost ? 'You are joining as a Host' : 'You are joining as a bidder'}
           {
            setName(e.target.value)
          }} />
          
        
      
    
  )
}

export default SetupScreen

Now that we have the basic setup let's build the live auction platform.

Create a file src/pages/meeting/Meeting.tsx.

Our live auction app will implement the following functionality:

Give hosts an option to start/stop the auction.
Give hosts an option to navigate between different auction products.
Allow users to make real-time bids for each product.
Show the highest bid to all users.

import { useEffect, useState } from 'react'
import './meeting.css'
import {
  DyteCameraToggle,
  DyteChatToggle,
  DyteGrid,
  DyteHeader,
  DyteLeaveButton,
  DyteMicToggle,
  DyteNotifications,
  DyteParticipantsAudio,
  DyteSidebar,
  sendNotification,
} from '@dytesdk/react-ui-kit'
import { useDyteMeeting } from '@dytesdk/react-web-core';
import { AuctionControlBar, Icon } from '../../components';
import { bidItems } from '../../constants';

interface Bid {
  bid: number;
  user: string;
}

const Meeting = () => {
  const { meeting } = useDyteMeeting();

  const [item, setItem] = useState(0);
  const [isHost, setIsHost] = useState(false);
  const [showPopup, setShowPopup] = useState(true);
  const [auctionStarted, setAuctionStarted] = useState(false);
  const [activeSidebar, setActiveSidebar] = useState(false);
  const [highestBid, setHighestBid] = useState({ bid: 100, user: 'default' });

  const handlePrev = () => {
    if (item - 1 < 0) return;
    setItem(item - 1)
    meeting.participants.broadcastMessage('item-changed', { item: item - 1 })
  }
  const handleNext = () => {
    if ( item + 1 >= bidItems.length) return;
    setItem(item + 1)
    meeting.participants.broadcastMessage('item-changed', { item: item + 1 })
  }

  useEffect(() => {
    setHighestBid({
      bid: bidItems[item].startingBid,
      user: 'default'
    })
  }, [item])

  useEffect(() => {
    if (!meeting) return;

    const preset = meeting.self.presetName;
    if (preset.includes('host')) {
      setIsHost(true);
    }

    const handleBroadcastedMessage = ({ type, payload }: { type: string, payload: any }) => {
      switch(type) {
        case 'auction-toggle': {
          setAuctionStarted(payload.started);
          break;
        }
        case 'item-changed': {
          setItem(payload.item);
          break;
        }
        case 'new-bid': {
          sendNotification({
            id: 'new-bid',
            message: `${payload.user} just made a bid of $ ${payload.bid}!`,
            duration: 2000,
          })
          if (parseFloat(payload.bid) > highestBid.bid) setHighestBid(payload)
          break;
        }
        default:
          break;
      }
    }
    meeting.participants.on('broadcastedMessage', handleBroadcastedMessage);

    const handleDyteStateUpdate = ({detail}: any) => {
        if (detail.activeSidebar) {
         setActiveSidebar(true);
        } else {
          setActiveSidebar(false);
        }
    }

    document.body.addEventListener('dyteStateUpdate', handleDyteStateUpdate);

    return () => {
      document.body.removeEventListener('dyteStateUpdate', handleDyteStateUpdate);
      meeting.participants.removeListener('broadcastedMessage', handleBroadcastedMessage);
    }
  }, [meeting])

  useEffect(() => {
    const participantJoinedListener = () => {
      if (!auctionStarted) return;
      setTimeout(() => {
        meeting.participants.broadcastMessage('auction-toggle', {
          started: auctionStarted
        })
      }, 500)
    
    }
    meeting.participants.joined.on('participantJoined', participantJoinedListener);
    return () => {
      meeting.participants.joined.removeListener('participantJoined', participantJoinedListener);
    }
  }, [meeting, auctionStarted])

  const toggleAuction = () => {
    if (!isHost) return;
    meeting.participants.broadcastMessage('auction-toggle', {
      started: !auctionStarted
    })
    if (!auctionStarted) {
      meeting.self.pin();
    } else {
      meeting.self.unpin();
    }
    setAuctionStarted(!auctionStarted);
  }

  return (
    
      
      

      
        
          {
            auctionStarted && (
               setShowPopup(() => !showPopup)}>
                
              
            )
          }
        
      

      
        {
          auctionStarted && (
            
              
              
                {bidItems[item].description}
              
              
          
          )
        }
        
        {activeSidebar && }
      

      
        
        
        
        
        {
          isHost && (
            
          )
        }
      
    
  )
}

export default Meeting

Et voila! Our live auction app is ready.

You can play around with this live bidding app to get you going with some real-time bidding at - https://dyte-live-bidding.vercel.app/! Dyte going once, going twice...

You can check out the complete code for the project here.

Are you sold yet, or do we need to say more?

Conclusion

With this, we have built our live auction platform. All you have to do is pull out your gavel and shout at the top of your lungs. We look forward to you outbidding us!

If you have any thoughts or feedback, please reach out to us on LinkedIn and Twitter. Stay tuned for more related blog posts in the future!

Get better insights on leveraging Dyte's technology and discover how it can revolutionize your app's communication capabilities with its SDKs. Head over to dyte.io to learn how to start quickly on your 10,000 free minutes, which renew every month. You can reach us at support@dyte.io or ask our developer community if you have any questions.

AI Generated Background Images for Video Calls

Vaibhav Shinde — Tue, 19 Nov 2024 10:30:00 GMT

TL;DR

In this tutorial, we will create a video-calling app using the Dyte SDK. This app will enable users to input prompts, generate AI-generated images, and set them as background during video calls. ✨

Introduction

With the advent of text-to-image models like DALL.E 3 and Midjourney, there has been a rise in different use cases for them.

In this tutorial, we're going to create a video calling app that uses a similar model from Stability AI. Users can input prompts and have AI-generated images as their backgrounds during video calls. ✨

So, let's get going! 🚀

High-Level Design of the application

Our aim is to create a seamless and engaging video-calling experience that goes beyond the ordinary. When users click the "Create Meeting" button in our app, they see a staging area with the option to input prompts for AI image generation. The generated image will be set as the background of your video.

In this project, we will use React with Dyte UI kit and Dyte React Web Core packages for the frontend.
For the backend, we will use NodeJS with Express
For image generation, we will use Stability AI
Lastly, we will use Imgur for storing screenshots.

Folder Structure

We will keep our client code in frontend folder meanwhile our backend code will reside in the root folder itself. After completing the tutorial, the folder structure will look like this. 👇

├── frontend
│   ├── public
│   └── src
│       ├── components
│       │   ├── Home.js
│       │   ├── Meet.js
│       │   └── Stage.js
│       ├── App.css
│       ├── App.js
│       ├── App.test.js
│       ├── index.css
│       ├── index.js
│       ├── logo.svg
│       ├── reportWebVitals.js
│       └── setupTests.js
├── package.json
└── src
    ├── api
    │   ├── dyte.js
    │   └── stability.js
    └── index.js

Step 0: Configurations and Setup

‍💻 Before building our application, we must set up a Dyte account.

We can create a free account by clicking the "Start Building" button on Dyte.io and signing up using Google or GitHub .

Once signed up, we can access our Dyte API keys from the "API Keys" tab in the left sidebar. We will keep these keys secure as we will use them later. 🤫

We will begin by creating a new directory for our project, and navigating into it using the following commands:

mkdir dyte
cd dyte

Please note:

We will also require accounts on the following platforms:

Imgur: Create an account on Imgur and create an API key. Here is a step-by-step guide
Stability AI : Here is a step-by-step guide

Now back to the tutorial.

Step 1: Setting up the frontend

Let's start setting up our front-end project using React and Dyte! ✨

create-react-appWe will create a boilerplate React app using . We can do this with the following command:

npx create-react-app frontend

This will initialize a new React app in the frontend directory. 📁

react-routerThen, we will go ahead and install the dyte react-web-core, dyte react-ui-kit and packages in this project using the following command 👇

cd frontend 
npm install @dytesdk/react-web-core @dytesdk/react-ui-kit react-router react-router-dom @dytesdk/video-background-transformer dotenv

0:00

/0:07

Step 2: Setting up the backend

Let's get started with setting up our NodeJs with express backend now. 🙌

We will go back to the root directory of our project and initiate our backend here itself for the ease of hosting:

npm init -y

Now let's install our dependencies

npm install express cors axios dotenv
npm install -g nodemon

Step 3: Setting up our backend application

First let us start by defining our .env file to store our 3rd party API keys. 🔑

DYTE_ORG_ID=
IMGUR_CLIENT_ID=
DYTE_API_KEY=
STABILITY_API_KEY=

Now we would need to write code for using Dyte and Stability APIs.

First we will write a module that provides a function to interact with the Stability API to generate an image based on our text prompt.

src/api/stability.js

// Description: This file contains the functions that interact with the Stability API
const axios = require("axios");
const dotenv = require("dotenv");
const path = require("path");

// Create an absolute path to the .env file located one directory above
const dotenvPath = path.join(__dirname, "../..", ".env");

// Load the environment variables from the .env file
dotenv.config({ path: dotenvPath });

const STABILITY_API_KEY = process.env.STABILITY_API_KEY;
const textToImage = async (prompt) => {
	const apiUrl =
		"https://api.stability.ai/v1/generation/stable-diffusion-512-v2-1/text-to-image";

	const headers = {
		Accept: "application/json",
		Authorization: STABILITY_API_KEY, // Replace with your actual API key
	};

	const body = {
		steps: 10,
		width: 512,
		height: 512,
		seed: 0,
		cfg_scale: 5,
		samples: 1,
		text_prompts: [
			{
				text: prompt,
				weight: 1,
			},
			{
				text: "blurry, bad",
				weight: -1,
			},
		],
	};

	try {
		const response = await axios.post(apiUrl, body, {
			headers,
		});

		if (response.status !== 200) {
			throw new Error(`Non-200 response: ${response.status}`);
		}

		const responseJSON = response.data;
		console.log(response);
		console.log(responseJSON);
		const base64Images = responseJSON.artifacts.map((image) => image.base64);

		console.log(base64Images);
		return base64Images[0];
	} catch (error) {
		throw new Error(`Error generating image: ${error.message}`);
	}
};

module.exports = { textToImage };

Now lets write code for using Dyte API.

src/api/dyte.js

const axios = require("axios");
const path = require("path");
const dotenv = require("dotenv");

// Create an absolute path to the .env file located one directory above
const dotenvPath = path.join(__dirname, "../..", ".env");

// Load the environment variables from the .env file
dotenv.config({ path: dotenvPath });

const DYTE_API_KEY = process.env.DYTE_API_KEY;
const DYTE_ORG_ID = process.env.DYTE_ORG_ID;

console.log(DYTE_API_KEY, DYTE_ORG_ID);

const API_HASH = Buffer.from(
  `${DYTE_ORG_ID}:${DYTE_API_KEY}`,
  "utf-8"
).toString("base64");

console.log(API_HASH);
const DyteAPI = axios.create({
  baseURL: "https://api.dyte.io/v2",
  headers: {
    Authorization: `Basic ${API_HASH}`,
  },
});

module.exports = DyteAPI;

Next, we will start with our index.js file, we would need to create the following routes:

POST /meetings - Create a new meeting

POST /meetings/{meetingId}/participants - This route is responsible for adding a participant to a specific meeting identified by meetingId

POST /upload - Responsible for generating the AI image from the prompt, uploading to imgur and returning the imgur link

So let's get started 👇

src/index.js

const express = require("express");
const cors = require("cors");
const DyteAPI = require("./api/dyte");
const axios = require("axios");
const bodyParser = require("body-parser");
const { textToImage } = require("./api/stability"); // Update the path accordingly

const PORT = process.env.PORT || 3000;
const app = express();

app.use(cors());
app.use(express.json());
app.use(bodyParser.json({ limit: "10mb" }));

app.post("/meetings", async (req, res) => {
	const { title } = req.body;
	const response = await DyteAPI.post("/meetings", {
		title,
	});
	return res.status(response.status).json(response.data);
});

app.post("/meetings/:meetingId/participants", async (req, res) => {
	const meetingId = req.params.meetingId;
	const { name, preset_name } = req.body;
	const client_specific_id = `react-samples::${name.replaceAll(
		" ",
		"-"
	)}-${Math.random().toString(36).substring(2, 7)}`;
	const response = await DyteAPI.post(`/meetings/${meetingId}/participants`, {
		name,
		preset_name,
		client_specific_id,
	});

	return res.status(response.status).json(response.data);
});

app.post("/upload", async (req, res) => {
	try {
		const { prompt } = req.body;
		console.log(prompt);

		const generatedImageBase64 = await textToImage(prompt);

		// Upload the generated image to Imgur
		const imgurClientId = process.env.IMGUR_CLIENT_ID;

		const response = await axios.post(
			"https://api.imgur.com/3/image",
			{
				image: generatedImageBase64,
			},
			{
				headers: {
					Authorization: `Client-ID ${imgurClientId}`,
					"Content-Type": "application/json",
				},
			}
		);

		const imgurLink = response.data.data.link;
		return res.status(200).json({ imgurLink });
	} catch (error) {
		console.error("Error uploading image:", error.message);
		if (error.response) {
			console.error("Imgur API response:", error.response.data);
		}
		return res.status(500).json({ error: "Could not upload image." });
	}
});

app.listen(PORT, () => {
	console.log(`Started listening on ${PORT}...`);
});

Step 4: Creating the frontend

This React application provides functionality to automatically create a meeting when the main route (/) is accessed and doesn't contain a meeting ID in the URL.

The application displays different views/components based on the route: a Home view when at the root and a Stage view when accessing a specific meeting.

App.js

import { useEffect, useState } from "react";
import Home from "./components/Home";
import { BrowserRouter, Routes, Route, Link } from "react-router-dom";
import "./App.css";
import Stage from "./components/Stage";

const SERVER_URL = process.env.SERVER_URL || "http://localhost:3000";

function App() {
	const [meetingId, setMeetingId] = useState();

	const createMeeting = async () => {
		try {
			const res = await fetch(`${SERVER_URL}/meetings`, {
				method: "POST",
				body: JSON.stringify({ title: "AI generated image background" }),
				headers: { "Content-Type": "application/json" },
			});

			if (!res.ok) {
				throw new Error("Failed to create meeting"); // You can customize the error message
			}

			const resJson = await res.json();
			setMeetingId(resJson.data.id);
		} catch (error) {
			console.error("Error creating meeting:", error);
		}
	};

	useEffect(() => {
		const id = window.location.pathname.split("/")[2];
		if (!!!id) {
			createMeeting();
		}
	}, []);

	return (
		
			
				}>
				}>
			
		
	);
}

export default App;

Now let's come to our Home component.

🚀 The Home component is the heart of the user interface, serving as a minimal entry point for the application. Here's what it does:

Prompt Handling: Users can input a prompt in a text field. This is managed using useState, making the input reactive and interactive.

Meeting Creation: The core feature here is the ability to create a meeting. Upon clicking the Create and join meeting button, it triggers handleCreateMeeting function which further calls handleUpload. This sends the prompt to a server and navigates the user to a meeting page.

src/components/Home.js

import { useNavigate } from "react-router-dom";
import { useState } from "react";
import { useAIImage } from "../SharedDataContext";

function Home({ meetingId }) {
	const [prompt, setPrompt] = useState("");
	const [loading, setLoading] = useState(false);
	const { updateAIImageUrl } = useAIImage();
	const navigate = useNavigate();

	const REACT_APP_SERVER_URL =
		process.env.REACT_APP_SERVER_URL || "http://localhost:3000";

	const handleUpload = async () => {
		try {
			const response = await fetch(REACT_APP_SERVER_URL + "/upload", {
				method: "POST",
				headers: {
					"Content-Type": "application/json",
				},
				body: JSON.stringify({ prompt: prompt }),
			});

			if (response.ok) {
				const data = await response.json();
				console.log(data);
				updateAIImageUrl(data?.imgurLink);
				setLoading(false);
				navigate(`/meeting/${meetingId}`);
			} else {
				console.log("error" + response);
			}
		} catch (error) {
			console.log(error);
		}
	};

	const handleCreateMeeting = async () => {
		setLoading(true);
		try {
			updateAIImageUrl(prompt);
			handleUpload();
		} catch (error) {
			console.error("Error generating image:", error);
		}
	};

	return (
		
			
				Enter AI prompt
			
			 setPrompt(e.target.value)}
				style={{
					paddingTop: "8px",
					paddingBottom: "8px",
					paddingLeft: "4px",
					paddingRight: "4px",
					border: "2px #2260FD solid",
					borderRadius: "4px",
					width: "300px",
					marginBottom: "20px",
				}}
			/>
			
		
	);
}

export default Home;

📸 Here's how our root page looks after adding the Home component :

Now, let's delve into the Stage component that renders on route /meeting/:meetingId which acts as a container component, orchestrating the meeting stage of the application.

When the admin clicks on the link provided on the / route, he gets redirected to the Stage page

src/components/Stage.js

import Meet from "./Meet";

const Stage = () => {
	return (
		
			<>
				
			
		
	);
};

export default Stage;

And the last one is the Meet component. The Meet component is where we utilize the DyteMeeting component from the Dyte SDK, to set up and manage a meeting environment.

src/components/Meet.js


import { useState, useEffect, useRef } from "react";
import { DyteMeeting, provideDyteDesignSystem } from "@dytesdk/react-ui-kit";
import { useDyteClient } from "@dytesdk/react-web-core";
import DyteVideoBackgroundTransformer from "@dytesdk/video-background-transformer";
import { useAIImage } from "../SharedDataContext";

// Constants

const REACT_APP_SERVER_URL =
	process.env.REACT_APP_SERVER_URL || "http://localhost:3000";

const Meet = () => {
	const meetingEl = useRef();
	const [meeting, initMeeting] = useDyteClient();
	const [userToken, setUserToken] = useState();
	const [hasInitializedBackground, setHasInitializedBackground] =
		useState(false);

	const { AIImageUrl } = useAIImage();

	const meetingId = window.location.pathname.split("/")[2];

	const initializeVideoBackground = async () => {
		try {
			if (!meeting) {
				return; // No need to proceed if the meeting is not available
			}

			const videoBackgroundTransformer =
				await DyteVideoBackgroundTransformer.init();
			const videoMiddleware =
				await videoBackgroundTransformer.createStaticBackgroundVideoMiddleware(
					AIImageUrl
				);

			meeting.self.addVideoMiddleware(videoMiddleware);
			console.log("Video background initialized");
		} catch (error) {
			console.error("Error initializing video background:", error);
		}
	};

	const joinMeeting = async (id) => {
		try {
			const res = await fetch(
				`${REACT_APP_SERVER_URL}/meetings/${id}/participants`,
				{
					method: "POST",
					body: JSON.stringify({
						name: "new user",
						preset_name: "group_call_host",
						meeting_id: meetingId,
					}),
					headers: { "Content-Type": "application/json" },
				}
			);

			if (!res.ok) {
				throw new Error("Failed to join meeting"); // Customize the error message
			}

			const resJson = await res.json();
			return resJson.data.token;
		} catch (error) {
			console.error("Error joining meeting:", error.message);
		}
	};

	const joinMeetingId = async () => {
		if (meetingId) {
			const authToken = await joinMeeting(meetingId);
			await initMeeting({
				authToken,
			});
			setUserToken(authToken);
		}
	};

	useEffect(() => {
		if (meetingId && !userToken) joinMeetingId();
	}, []);

	useEffect(() => {
		if (meeting && !hasInitializedBackground) {
			initializeVideoBackground();
			setHasInitializedBackground(true);
		}
	}, [meeting, hasInitializedBackground]);

	useEffect(() => {
		if (userToken) {
			provideDyteDesignSystem(meetingEl.current, {
				theme: "light",
			});
		}
	}, [userToken]);

	return (
		
			{userToken && (
				<>
					
						
					
				
			)}
		
	);
};

export default Meet;

This component will handle the meeting setup. It will take user prompt and apply an AI-generatedted image from that prompt as the background.nd.

Step 5: Trying out our application

Ta-da! ✨ It's time to put our application to the test and see it in action!

First click on create meeting 🧑‍💻
Then we give a prompt for the AI to generate an image from. Voila! Your background is set
We then join the meeting with our new customised AI generated background!

0:00

/0:06

You can try the live demo here: Live Demo Link

To run locally:

n the / folder

npm i
nodemon index.js

In frontend/

npm i
npm run start

Link to GitHub Repo

Conclusion

In conclusion, we've harnessed recent image generation AI's power to create our video call backgrounds, opening up exciting possibilities. Now, your virtual meetings, classes, or gatherings can be infused with vibrant visuals, from scenic landscapes to dynamic artwork.

You may go ahead and start creating your video calling applications with Dyte. 🌟

Packaging Libraries in iOS: A Comprehensive Guide

Shaunak Jagtap — Mon, 18 Nov 2024 17:37:00 GMT

Understanding the premise

Software Development Kits (SDKs) are the lifeblood of development, offering a treasure of pre-packaged libraries, tools, and resources that empower developers to craft rich and feature-packed applications. However, for those new to distributing SDKs in iOS, the path can be filled with uncertainty.

We have shipped 100s of iOS SDK builds in the form of our WebRTC clients for customers looking to integrate real time communications within their applications. With that experience, this technical blog post is tailored for iOS developers and aims to illuminate the nuances of SDK distribution, including key concepts, best practices, and tools. With this knowledge, developers can unlock the full potential of their SDKs and ensure seamless integration experiences.

Static vs. dynamic libraries and frameworks in iOS

Seasoned iOS developers often grapple with crucial decisions regarding system frameworks, packaging their code, and integrating third-party components. Among these decisions, choosing between static and dynamic libraries or frameworks is pivotal, making profound implications for application performance and resource management.

Implications on app size and launch time

The choice between static and dynamic libraries or frameworks carries significant weight regarding your app's binary size and launch time.

Summary of static Vs. dynamic linking

Here's a concise summary of how static and dynamic linking impacts various facets of your application:

Facets	Static Linking	Dynamic Linking
App Size	Large app bundle size	Smaller app bundle size
App Launch Time	Faster load time	Slower load time
Safety	Scrutinised and copied at build time	Risk of runtime glitches, potential for runtime crashes
Deployment	In static linking, the SDK or library is bundled within the single app binary, and the entire library's code becomes an integral part of the app's binary executable.	App references the library at runtime, and the operating system is responsible for loading the required library when the app is launched or when the specific library functions are first called
Debugging	Easier to debug as all code is available	Harder to debug as code may not be available at runtime
Memory Usage	apps that are statically linked tend to have less memory usage during runtime.	may have slightly more memory usage during runtime
Flexibility	Updates to the SDK or library require app recompilation and release because the entire library is bundled with the app's binary during compilation, preventing separate updates.	Library updates can be made independently of the app, offering flexibility and allowing users to benefit from updates without app recompilation.

When to use dynamic linking

While statically linked modules proffer a smaller app size and accelerated loading times, dynamic linking has its own set of compelling use cases:

Multiple static modules depending on the same module: When your app comprises multiple static modules that lean on a common module, you might encounter warnings about duplicate symbols at runtime. Transitioning the shared module to a dynamic one can effectively mitigate this issue.
iOS increased app launch time with many dynamic libraries/frameworks: Loading numerous third-party dynamic libraries or frameworks on iOS can lead to prolonged app launch times. Vigilant monitoring and optimisation efforts are essential to upholding app launch performance.

Static/dynamic with different integration techniques

To make well-informed decisions about linking, a profound understanding of how various dependency managers handle linking behaviour is imperative:

Own targets or projects linked directly: Exert precise control over the linking behaviour of targets within your repository or external repositories by fine-tuning Build Settings. A simple adjustment to the MACH_O_TYPE Build Setting allows you to toggle between static library and dynamic library.
CocoaPods: By default, CocoaPods constructs and links dependencies as static libraries. The introduction of use_frameworks! in your Podfile enables the construction of dynamic frameworks. Moreover, :linkage => :static can be employed to shape dependencies as static frameworks.
Swift Package Manager (SPM): SPM, by default, fabricates dependencies as static libraries, offering minimal control over linking behaviour. However, if you are the custodian of a package, you can specify type: .dynamic within your Package.swift file to fashion a dynamic package.
Carthage: In its default configuration, Carthage leans towards using dynamic frameworks for dependencies. Nevertheless, you retain the flexibility to configure it to construct and link them statically when circumstances dictate such an arrangement.

Understanding different formats

In the realm of iOS development, many formats for packaging libraries and resources exist. Each format serves distinct purposes, and as an expert iOS developer, comprehending their nuances is indispensable. Here's an exploration of some essential formats:

xcframework

XCFramework is a relatively recent addition to Apple's arsenal of formats. It is a versatile container meticulously designed for packaging frameworks for diverse platforms and architectures into a single, harmonious bundle. Embracing XCFrameworks streamlines the distribution of binary frameworks, ensuring harmonious coexistence across various Apple devices and processor architectures. Adopting XCFrameworks bestows developers with the gift of streamlined development, reduced integration complexities, and enhanced application performance.

Framework

The classic framework format remains a stalwart choice for packaging code and resources in iOS development. Frameworks offer a structured habitat for your codebase, gracefully accommodating header files, binaries, and resources. They advocate the virtues of encapsulation and modularity, simplifying the process of integration and maintenance of your code. Frameworks wear the dual hats of either static or dynamic, contingent upon whether the code binds at compile time (static) or runtime (dynamic).

.a (Static Library) and .o (Object File)

.a files, heralded as static libraries, house compiled code snugly woven into the fabric of your application at compile time. These libraries accompany your app's binary, bestowing it with a petite binary size and promising accelerated startup times. .o files, or object files, inhabit the realm of intermediate compilation units, wielding the potential to unite and give birth to static libraries or dynamically linked frameworks. Distinguishing between the use cases of static libraries and object files is pivotal for optimising your app's performance and memory footprint.

.dylib (Dynamic Library)

The world of .dylib files beckons dynamic libraries, extending an invitation for their ad hoc arrival at runtime as your app springs into existence. Dynamic libraries usher in a measure of flexibility in code sharing but might nudge your app's binary size northward. Dynamic libraries traditionally find their calling in system frameworks and shared system components. Handling them with care and vigilance is a requisite, for any slip-up in configuration or inclusion can usher in the Spector of runtime crashes.

Universal Framework

Universal frameworks, the "Jack of all trades" among formats, don the cloak of versatility. They are a specialised framework format engineered to accommodate multiple architectures and platforms under one sprawling roof. This format performs an invaluable service, simplifying the distribution of cross-platform libraries. Developers can present a single binary, an embodiment of unity, capable of functioning seamlessly across an array of iOS devices and processor architectures.

Swift Package Manager

As a seasoned iOS developer, you're no stranger to the capabilities wielded by Swift Package Manager (SPM) when it comes to dependency management. Within the labyrinthine corridors of SPM, two pivotal concepts demand your attention: .binaryTarget, .target, and the strategic utilization of linkerSettings.

`.binaryTarget`

Introduced as a breath of fresh air, .binaryTarget assumes the mantle of a feature within Swift Package Manager designed explicitly to streamline the integration of binary dependencies. Binary dependencies are pre-compiled libraries or frameworks presented by third-party sources, emerging as swift and efficient companions for integration. Here's the lowdown:

Efficient integration: With the declaration of a .binaryTarget, Swift Package Manager orchestrates retrieving a pre-compiled binary from a designated source, whether a Git repository or a URL. This streamlined approach expedites integration, ushering inconvenience.
Platform-agnostic brilliance: Binary targets shine as platform-agnostic stars, casting their glow across diverse platforms and architectures, including iOS, macOS, and beyond. This trait is especially beneficial for those engaged in cross-platform development endeavors.
Version control vigilance: Binary targets, operating in the realm of pre-compiled artifacts, dwell outside the confines of direct version control within your package. Instead, the version or tag of the binary dependency takes center stage within your Package.swift file.
Swift harmony: Ensure the selected binary target aligns with your project's Swift version for harmony and compatibility. A mismatch in Swift versions can sow the seeds of discord.

`.target`

In stark contrast to the pragmatic elegance of .binaryTarget, .target emerges as the go-to directive for source-based dependencies. Source-based packages carry their source code, ready to undergo the rites of compilation upon integration into your project. Vital insights into this concept include:

Source code integration: Invocation of a .target commands Swift Package Manager to embark on a journey of source code retrieval, cloning the package's source code and forging it into your project. This process offers the boon of customization, enabling you to mold or modify the package as per your requirements.
Version control ascendancy: Source-based packages ascend to prominence as the champions of direct version control within your project's repository. This translates into the power to wield influence over the package's code, affording the liberty to make modifications as needed.
Swift compatibility: As with binary targets, extending a cordial handshake of compatibility between your project's Swift version and the chosen source-based package is paramount. Avoidance of mismatched Swift versions can be your shield against compatibility conundrums.
The web of dependency: Source-based packages often weave a web of dependencies, crafting a sprawling tapestry of interconnected packages. Swift Package Manager rises to the challenge, orchestrating the management of this intricate dependency graph, ensuring that all required packages partake in the grand symphony of integration.

`linkerSettings`

linkerSettings, an entity of significance within the grand configuration of Swift Package Manager, entrusts you with the reins of control over linker flags and settings that govern the orchestration of your targets. Its importance is not to be underestimated:

Fine-grained command: linkerSettings stands as your herald, bearing the mandate of fine-grained control over the linking process. It paves the way for the specification of linker flags, search paths, and sundry settings, exerting a profound influence on how the package melds with your project.
Integration tailoring: Instances may arise where a package exhibits a penchant for specific linker flags or settings. Enter linkerSettings; this savior allows you to fashion an integration that gracefully accommodates the package's unique requirements.
Mitigating linking conflicts: In the integration, where multiple packages jostle for space, conflicts on the frontiers of linking can rear their heads. linkerSettings steps in as the peacemaker, offering the means to navigate these conflicts with poise and elegance, ensuring a harmonious integration experience.
Compatibility crusade: To preserve the sanctity of your project, heed the call of compatibility. Scrutinize the compatibility of linker flags and settings with your project's Swift version and platform. Mismatched configurations have the power to disrupt the tranquil flow of runtime.

The role of checksums in SPM

Checksums, those guardians of integrity and security, play a pivotal role within the domain of Swift Package Manager (SPM), specifically in the realm of package management. They serve as sentinels tasked with verifying the authenticity and integrity of external dependencies before ushering them into your project.

Package resolution: Your project's Package.swift file hosts declarations of dependencies. When summoned, the Swift Package Manager embarks on a mission to retrieve the package manifest, an essential dossier housing information about the package and its version.
Download and check: The saga continues with downloading the package's source code or binary artifact. Simultaneously, Swift Package Manager, armed with diligence, fetches the checksum linked to the package's version from a trustworthy source, often the package repository.
Verification: As the download completes, Swift Package Manager commences an expedition into checksum calculation. The calculated checksum stands face-to-face with its repository-forged counterpart, and only harmony, represented by a perfect match, rings true. A matching checksum signifies that the downloaded package mirrors the precise form and content expected by the package repository.

Mergeable libraries (New in Town!)

Apple announced Mergeable libraries in WWDC23, the unsung heroes sometimes cloaked in the monikers of "umbrella frameworks" or "universals," which is the fusion of multiple frameworks or libraries into a singular, cohesive framework. This gives a leaner, meaner, and more efficient application binary. Let's shine a light on the critical facets of mergeable libraries:

1. Reduction in binary size

The most prominent jewel in the crown of mergeable libraries is the substantial reduction in the size of your application binary. By amalgamating multiple frameworks into a singular entity, developers surgically excise redundant code and resources, rendering the binary trim and sleek. This trimness finds its true value in mobile applications, where app size directly influences download times and device storage.

2. Streamlined maintenance

Mergeable libraries don the mantle of the custodian of dependencies, simplifying the labyrinthine maintenance process. Instead of juggling several individual libraries, each with its own versioning and update cycle, developers are entrusted with the guardianship of a solitary, consolidated library. This harmonisation streamlines the update process and curtails the risk of version conflicts and compatibility conundrums.

3. Improved load/launch times

Diminished binary sizes usher in improved app launch times. With fewer resources to load into memory, the app leaps to life more swiftly, embellishing the user experience with the gift of alacrity. Reduced load times are particularly invaluable in scenarios where instant access to the application is not just a luxury but an expectation.

4. Cross-platform compatibility

Mergeable libraries possess the unique capability of accommodating multiple platforms and architectures, transforming them into the darlings of cross-platform development. In the hands of adept developers, a single library can extend its benevolent embrace across iOS, macOS, watchOS, tvOS, and many other platforms, fostering an ecosystem of harmonious coexistence.

Dyld vs. Dyld3: dynamic linker in iOS

In the vast landscape of iOS development, two linkers shine brightly: Dyld and its evolutionary offspring, Dyld3. These dynamic linkers perform the critical role of managing the loading and linking of libraries during an app's launch. For expert iOS developers, a deep understanding of their inner workings is essential.

Dyld

Startup performance: Dyld, a stalwart of iOS, has been meticulously designed to deliver efficient startup performance. It employs an arsenal of optimisations to minimise the time required for loading and linking libraries when an app takes its first breath. This efficiency is paramount in ensuring a seamless user experience.
Lazy binding: Dyld employs the ingenious strategy of lazy binding, which defers symbol resolution until the precise moment when a symbol is first utilized. This mechanism trims the startup overhead by avoiding unnecessary work during the initial stages of the app launch.
Shared caches: In its quest to enhance startup performance, Dyld harnesses the power of shared caches. These caches store pre-processed libraries, allowing multiple applications to share them. This shared resource optimises resource utilisation and further expedites the launch process.

Dyld3: The evolutionary leap

As iOS and macOS continued to evolve, the demands on dynamic linking also grew. This prompted the emergence of Dyld3, representing a substantial leap forward in dynamic linking technology. Dyld3 introduced several key advancements aimed at optimising app performance and resource management.

Key advancements in Dyld3:

Reduced memory overhead: Dyld3 was engineered with a focus on minimising memory overhead. It employs a more efficient data structure for managing loaded libraries, precious in resource-constrained environments such as mobile devices.
Parallel loading: Dyld3 introduces the paradigm of parallel loading, enabling it to load multiple libraries concurrently. This parallelism takes full advantage of multi-core processors, resulting in faster app launch times.
On-demand loading: Dyld3 adopts the strategy of on-demand loading, loading only the portions of libraries that are required at runtime. This "just-in-time" approach conserves memory and accelerates startup times.
Improved symbol binding: Enhancing symbol binding performance, ensuring that symbols are resolved efficiently as an app runs. This is crucial for maintaining smooth app performance during execution.

Creating and distributing an iOS SDK demands careful consideration of various elements. To embark on this journey, you must make critical decisions regarding the type of library (static or dynamic) that best suits your needs.

Dependency management tools like CocoaPods and Swift Package Manager (SPM) are crucial in linking and integrating your SDK into other projects. Understanding library formats, such as frameworks and xcframeworks, is essential for packaging your code effectively. Don't forget the importance of checksums in ensuring the security of your SDK.

Additionally, exploring the advantages of mergeable libraries can help reduce app size and simplify maintenance. Lastly, delve into the world of dynamic linkers like Dyld and Dyld3 to optimise app startup performance and memory usage. By mastering these components, you'll be well-prepared to create and distribute iOS SDKs that enhance the efficiency and reliability of your development projects.

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. You can reach us at support@dyte.io or ask our developer community if you have any questions.

Render Video Tracks From WebRTC Using Flutter PlatformViews

Aman Kumar — Mon, 18 Nov 2024 16:22:00 GMT

Flutter, unlike native Android, iOS, or even React Native apps, does not use system drawing primitives for rendering your application. This blog will give you all the theory of how, what, and why of using Flutter PlatformViews and how WebRTC(lib) VideoTracks can be rendered in your Flutter applications using PlatformViews.

Background

Before we start our discussion on how PlatformViews are rendered, let's discuss how Flutter generally draws its UI.

Flutter paints its UI from scratch every time using its graphics engine, Impeller. Flutter draws every pixel on the screen, giving developers a high degree of control over the UI.

Flutter uses 3 threads to render its UIs.

UI thread
Platform thread
Raster thread

The UI thread is where your Dart code runs. The Platform thread is responsible for creating widgets, calculating the layout, and other layout-related tasks. Once this is done, the layout tree is delegated to the Raster thread, which converts the tree into actual pixels on the screen.

// Dart code runs on the UI thread
void main() {
  runApp(MyApp());
}

Let’s see how this fits with our use case of making video available from the native side to Flutter.

WebRTC connection

There are multiple ways through which you can render your libwebrtc video tracks on Flutter, like for Android, you can

Use SurfaceTextureRenderer to render the VideoTrack, make the Texture available to Flutter using TextureRegistry API, and render the Texture in Flutter.
Use SurfaceViewRenderer to render the VideoTrack as an AndroidView, and use PlatformViews to use the native view in Flutter.

In this blog are going to talk about how the 2nd approach works under the hood. Specifically, what and why of PlatformViews.

What are Flutter PlatformViews?

PlatformViews are used when we want to use native views as a Flutter widget. There are two ways we can create PlatformView in Flutter.

1. Hybrid composition

In Hybrid composition, Flutter creates a special type of view and asks Android/iOS to create a corresponding view and embeds the native view into its own widget tree.

Below is how you can implement PlatformViews by using the Hybrid composition method -

Widget build(BuildContext context) {
  // This is used in the platform side to register the view.
  const String viewType = '';
  // Pass parameters to the platform side.
  const Map creationParams = {};

  return PlatformViewLink(
    viewType: viewType,
    surfaceFactory:
        (context, controller) {
      return AndroidViewSurface(
        controller: controller as AndroidViewController,
        gestureRecognizers: const >{},
        hitTestBehavior: PlatformViewHitTestBehavior.opaque,
      );
    },
    onCreatePlatformView: (params) {
      return PlatformViewsService.initSurfaceAndroidView(
        id: params.id,
        viewType: viewType,
        layoutDirection: TextDirection.ltr,
        creationParams: creationParams,
        creationParamsCodec: const StandardMessageCodec(),
        onFocus: () {
          params.onFocusChanged(true);
        },
      )
        ..addOnPlatformViewCreatedListener(params.onPlatformViewCreated)
        ..create();
    },
  );
}

Here's a breakdown of what each part does:

1. Variables initialization:

viewType is a unique identifier for the native view. This should match the identifier used in the native Android code.
creationParams is a map that can hold any parameters you want to pass to the native view for its initialization.

2. PlatformViewLink widget:

This widget serves as a bridge between the Flutter framework and the native view. It takes in the viewType and two factory functions: surfaceFactory and onCreatePlatformView.

3. surfaceFactory function:

This function returns an AndroidViewSurface widget, which is responsible for displaying the native Android view.
It takes a controller argument, an instance of AndroidViewController, which is used to control the native Android view.
gestureRecognizers specifies which gestures the native view should consume. In this example, it's set to an empty set, meaning the native view won't consume any gestures.
hitTestBehavior is set to PlatformViewHitTestBehavior.opaque, which means the native view will block touches to underlying Flutter widgets.

4. onCreatePlatformView function:

This function initializes the native Android view and returns an instance of it.
It uses PlatformViewsService.initSurfaceAndroidView to initialize the native view, passing in various parameters like id, viewType, and creationParams.
An onFocus callback is also defined, which is triggered when the native view gains focus.
A listener is added to notify when the Android view is created successfully.

By using this build method in your Flutter app, you can seamlessly integrate a native Android view into your Flutter widget tree.

2. Virtual display mode

In virtual display mode, Flutter creates an offscreen android.view.Surface, which is a drawing surface to render graphics. It’s like a blank canvas. Then the native view renders its content onto this surface. Flutter then takes the content of that surface and uses it as a texture within its rendering pipeline. This texture is drawn where the platform view should appear.

iOS only supports Hybrid composition.

Below is how you can implement virtual display mode.

Widget build(BuildContext context) {
  // This is used in the platform side to register the view.
  const String viewType = '';
  // Pass parameters to the platform side.
  final Map creationParams = {};

  return AndroidView(
    viewType: viewType,
    layoutDirection: TextDirection.ltr,
    creationParams: creationParams,
    creationParamsCodec: const StandardMessageCodec(),
  );
}

This widget is used to display the native Android view within the Flutter app.
It takes several parameters:
viewType specifies the type of Android view to create.
layoutDirection sets the text direction, which is left-to-right (TextDirection.ltr) in this example.
creationParams are the initial parameters to pass to the Android view.
creationParamsCodec specifies how to encode creationParams. The StandardMessageCodec is used for encoding basic types like strings, numbers, and collections.

Native implementation

On the native side, we need to create the view that we need to serve, create a factory, and register it so that the native platform can create views whenever it is asked by Flutter.

To do this in Android we do -

class NativeViewFactory : PlatformViewFactory(StandardMessageCodec.INSTANCE) {
    override fun create(context: Context, viewId: Int, args: Any?): PlatformView {
        val creationParams = args as Map?
        return NativeView(context, viewId, creationParams)
    }
}

Next, we have to register this factory in the flutter engine -

binding.platformViewRegistry
                .registerViewFactory("", NativeViewFactory())

Performance Impact

Before Android 10, hybrid composition required a lot of to and fro between main memory and GPU to compose native views with flutter widgets, after Android 10, copying is done only once, which makes it very performant.

Before Android 10

After Android 10:

Here is a small table that can help you decide which method to choose:

What did we choose at Dyte?

At Dyte, our Flutter SDK is a wrapper over the Android and iOS SDKs, and we don't load any WebRTC video/audio streams directly. We use VideoView from our Android SDK to display the video, which is then served as a PlatformView. We prefer a hybrid composition due to its performance benefits.

Challenges

We have two Flutter SDKs: dyte core and dyte uikit. Dyte core has no knowledge of where the VideoView (which is a PlatformView) is going to be used. It might be possible that the VideoView for a single participant is used on multiple screens. Let’s call this VideoView as FlutterVideoView since VideoView is also present in our Android SDK and is a crucial part of this discussion.

We wrote a PlatformView, which serves VideoView. Let’s call it AndroidVideoView. Dyte core is dependent on our core mobile SDKs and VideoView is a View in Android.

We know that one View can be part of only one ViewGroup at any moment of time.

Let’s keep this in the back of the mind that Flutter has no direct support to detect the visibility of a widget.

The problem we were facing was when we used FlutterVideoView for the same participant on two different screens. For the sake of simplicity, let’s call it screen A and screen B. Now when the user navigates from screen A to screen B everything was working smoothly as screen B was creating a fresh FlutterVideoView, which internally created a new instance of AndroidVideoView, so the VideoView is now attached to a new ViewGroup.

The problem was when the user navigated back from screen B to screen A, there was no way to detect if FlutterVideoView widget had gone from background to foreground so we could call render() on the AndroidVideoView, which is a PlatformView in our Flutter SDK. render() method calls the render() of VideoView, which internally takes care of removing the view from the old ViewGroup and renders the video track.

On the Dyte UI kit, we can get the lifecycle methods if the screen has been changed, but that would not solve the actual problem of the view not getting refreshed on its own. To solve this, we used a community plugin, visibility_detector in Dyte core flutter SDK. It gave us a widget that had callbacks that get triggered when its child widget’s visibility is changed.

Now the second part of the problem was to find out the PlatformView, which was associated with the widget and call render() on it. To tackle this we had to retrieve the View from the Flutter engine since all the PlatformViews are created by the Flutter engine itself.

To tackle this, we cached the FlutterEngine using FlutterEngineCache. This allowed us to access the PlatformView by its viewId.

FlutterEngineCache.getInstance().put("DyteFlutterEngine", flutterPluginBinding.flutterEngine)

Through the engine, we can access the PlatformView with the help of viewId, which is assigned to every PlatformView when it is created.

onCreatePlatformView: (params) {
        return PlatformViewsService.initSurfaceAndroidView(
         id: params.id,
         viewType: viewType,
         layoutDirection: TextDirection.ltr,
         creationParams: creationParams,
         reationParamsCodec: const StandardMessageCodec(),
         onFocus: () {
		     params.onFocusChanged(true);
          },
        )
				// Here we assign the view id.
          ..addOnPlatformViewCreatedListener(setNativeViewId)
          ..create();
      },

To access the native view, we did

val targetView = FlutterEngineCache.getInstance()["DyteFlutterEngine"]!!
	            .platformViewsController.getPlatformViewById(viewId) as AndroidVideoView?

Finally, we call the render() method on this view.

targetView?.render()

After all these steps, we clear the cached FlutterEngine to free up the memory. This helped us solve a critical issue in our Flutter video SDK.

Conclusion

This concludes our take on Flutter PlatformViews and how we use it at Dyte. Video is the most essential part of our Flutter SDK, and the Flutter ecosystem has provided us with great utilities to deal with it.

On top of it, we have made it seamless for you to have feature-rich audio video conferencing in your app with the least hassle. Stay tuned for more engineering blogs, we talk about behind the scenes, WebRTC, and cool things that can be built upon Dyte. Check out our blog here.

If you have any thoughts or feedback feel free to connect with me on LinkedIn or X(Twitter).

Kotlin MPP: Basics

Harsh Shandilya — Fri, 15 Nov 2024 09:54:00 GMT

When we set out to build our mobile SDKs at Dyte, we decided early on that we wanted to reuse as much code between platforms as possible. It would save us a lot of time and effort as a small team not to have to implement everything twice — once for iOS and then again for Android.

Taking cues from Dropbox

There simply weren't many options available to achieve what we had set our minds to. We could write C++ (or Rust) and then use FFI bindings from them to iOS and Android, but the overhead of learning a completely new and unfamiliar language made it quite a risky option. Dropbox had been a big champion of this strategy, and it worked for them for a long time.

Still, ultimately, the friction of forcing mobile developers to write C++ ended up with them ditching the idea and just opting to build stuff twice in the platform's respective native languages.

The blended alternative of Kotlin MPP

Kotlin offered us an optimally blended alternative way to do this. It was a language already familiar to our Android developers and was relatively easy for iOS engineers to pick up due to its syntactic familiarity with Swift.

It is easier to write than C++, and the excellent Kotlin Multiplatform toolchain ensured we were not taking on avoidable FFI overhead by being able to compile our code down to platform-native formats.

Kotlin Multiplatform was announced at KotlinConf 2017 under the name Kotlin Multiplatform Projects (MPP) and initially supported JVM, Native, and JavaScript. It enabled the Kotlin toolchain to compile executables and libraries for different platforms from a single, shared Kotlin codebase. The powerful expect/actual system of bridging platform-specific APIs into standard code allows for common business logic to be written once but still be able to lean into the target platform for specific things.

Leveraging bidirectional interoperability

Unlike other cross-platform solutions, which tend to invent everything themselves and treat native interoperability as an afterthought, Kotlin strongly focuses on bidirectional interoperability. Kotlin Multiplatform can consume dependencies written in the platform's target language, expose them within Kotlin, and compile them to the same binary format as the target language.

For example, Kotlin code for Apple platforms can consume Cocoapods libraries through Kotlin's first-party Gradle integration, and developers can use the libraries like any other Kotlin code. This also goes in the other direction — the Kotlin Gradle plugin enables building Kotlin libraries into XCFramework bundles that can be imported into Xcode projects.

At Dyte, we leverage Kotlin Multiplatform's vast ecosystem to build all our mobile SDKs from a single shared codebase written entirely in Kotlin. It enables us to deliver a consistent API and feature set across iOS and Android without compromising quality for either platform.

Its powerful publishing tooling ensures that our usage of Kotlin Multiplatform remains transparent to our clients. Clients using our SDK on Android see a regular AAR file; on iOS, they get an XCFramework. Thanks to this, our SDKs integrate painlessly into existing mobile codebases like any other dependency.

Final thoughts

In conclusion, Kotlin MPP is an excellent tool for efficient cross-platform development. Allowing for shared code between mobile platforms and prioritizing bidirectional interoperability makes delivering a consistent experience across platforms easier without compromising quality. At Dyte, we've seen firsthand the benefits of Kotlin Multiplatform, and we're excited to see where it goes in the future.

We hope you found this post informative and engaging. If you have any thoughts or feedback, please reach out to us on LinkedIn and Twitter. Stay tuned for more related blog posts in the future!

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. You can reach us at support@dyte.io or ask our developer community.

Announcing End-to-End Encryption in Dyte

Palash Golecha — Fri, 15 Nov 2024 08:07:00 GMT

True end-to-end encryption for audio and video calls is in beta in Dyte SDKs!

In this blog, we'll explore the mechanisms behind end-to-end encryption (E2EE) in Web Real-Time Communication (WebRTC), focusing on its standards, implementation strategies, impact on performance, and how it works within our SDK.

Inbuilt security

WebRTC communication is already encrypted. It uses DTLS combined with SRTP to secure both data and media communications.

SRTP is an extension of the Real-time Transport Protocol (RTP) to deliver audio and video over the Internet. While RTP handles the delivery, it lacks built-in security features, which is where SRTP comes into play.

Key Features of SRTP:

Encryption: SRTP encrypts the payload of RTP packets, which contains the actual media data (voice, video, etc.), using symmetric encryption algorithms (AES). This ensures that the content of the communication cannot be easily eavesdropped or intercepted by unauthorized parties.
Message Authentication: SRTP provides a mechanism to verify the authenticity of messages, ensuring that the data has not been tampered with in transit. This is typically achieved using a Message Authentication Code (MAC), a small piece of information, or a tag derived from the packet content and a secret key.
Replay Protection: SRTP implements replay protection to ensure that attackers cannot capture and re-send packets in an attempt to disrupt the communication. This is done by keeping track of the sequence numbers of packets and rejecting any that are out of order or duplicated.
Integrity Protection: SRTP also ensures the integrity of the data transmitted using MACs, confirming that it has not been altered from its original form during transit.

The Man-in-the-middle: SFU

While WebRTC was designed as a peer-to-peer protocol, most common implementations involve a centralized server that routes media to different parties. WebRTC connections are made from Client → SFU and then SFU → Client, which means all the inbuilt encryptions stop at the server. Data is decrypted on the server and re-encrypted.

While being secure in transit is acceptable in most security threat models, there are use cases where you want mathematical guarantees against tampering or eavesdropping.

Implementing End-to-End Encryption

As video encoding is lossy, ideally, you would want to apply encryption to the encoded frames before they are transmitted (RTP packetizer).

Now, there are two competing standards on how to do this on web browsers:

Insertable Streams API - Introduced in 2020, supported only on Chromium browsers. This is a more general-purpose API that not only allows modification post-encoding and pre-RTP packetization but also allows you to modify frames pre-encoding.
RTCRtpScriptTransform - The current standard, not supported by Chromium but supported by Firefox and Safari, is much more limited and designed for cases like end-to-end encryption.

The good news is that since they both come at the same stage of the pipeline, they are interoperable, i.e., media encrypted using Insertable Streams can be decrypted using RTCRtpScriptTransform.

Also, since these encryption steps would be computationally heavy, we don't want to do them on the main thread. We will use Web Workers to offload the encryption/decryption to a different thread.

Under the hood

Now, we have a place where we can encrypt/decrypt media, but what about the actual encryption process? Technically, you can encrypt it using XORing with a static bit or something naive, but that wouldn't be secure.

We chose AES-GCM to encrypt the media frames/samples. LiveKit's implementation of the same feature inspires our encryption algorithm implementation.

IV Generation

The IV is used in AES-GCM encryption to provide uniqueness to the encryption process. This ensures that the same payload (e.g., video frame) encrypted multiple times with the same key will result in different ciphertexts. For IV, you just need to make sure that an adversary cannot predict the IV in advance, and for that, we use a combination of time and WebRTC metadata around the stream, which guarantees this to be unique.

Key Derivation and Key Ratcheting

If your app users set the encryption key as "12345678," you don't want AES to use this weak key directly. PBKDF2 puts the password and the salt through a pseudo-random function a set number of times, according to the value for iteration count. The final output is a strong key. Therefore, we use PBKDF2 to derive strong keys from weak keys.

The same PBKDF2 mechanism can support key ratcheting, which involves periodically updating the encryption keys used in a communication session. This ensures that the compromise of one key does not compromise past or future communications.

The current encryption key is used to derive a new key at regular intervals or based on specific conditions (e.g., the number of messages sent).
The new key replaces the old key for subsequent encryptions, effectively "ratcheting" forward the key material.
Participants in the communication must synchronize the ratcheting process to ensure they can decrypt received messages with the correct key.

Encrypting the frame

The media frame payload sometimes carries metadata that the SFU requires to function, such as keyframes; therefore, part of the RTP payload must be kept unencrypted.

This differs for each codec (VP8/VP9/OPUS) and each frame type. Dyte SDK provides end-to-end encryption support for all the codecs we support — VP8 and VP9 for video and OPUS for audio.

Enable end-to-end encryption in your Dyte setups

We are rolling this out gradually, and therefore, you will need to contact support@dyte.io to have this enabled.

However, once this is enabled, the integration is relatively straightforward.

Let's first see how a typical Dyte SDK initialization works.

import DyteClient from '@dytesdk/web-core';

const meeting = await DyteClient.init({
      authToken,
});

// use meeting object

To implement end-to-end encryption,

import DyteClient from "@dytesdk/web-core";
import DyteE2EEManager from "@dytesdk/web-core/modules/e2ee";

const sharedKeyProvider = new DyteE2EEManager.SharedKeyProvider();
sharedKeyProvider.setKey("meeting-password");

const e2eeManager = new DyteE2EEManager({ keyProvider: sharedKeyProvider });

const meeting = await DyteClient.init({
  authToken,
  modules: {
	  e2ee: {
		  enabled: true,
		  manager: e2eeManager
	  }
  }
});

The above example uses a shared key provider, which, in simple words, is a single key that is used for all encryption of all participant's media. You can also set a different key per participant using DyteE2EEManager.ParticipantKeyProvider(); but you will have to coordinate passing the correct key on every participant join.

The key takeaway is that you handle the movement of keys, ensuring all participants use the correct key. This key should ideally be transported outside of Dyte-provided communication channels and your own trusted communication channels. Dyte will handle the encryption and media delivery.

Can I use the X feature while end-to-end encryption is enabled?

Generally, all features should be available except Cloud Recording/AI/ Transcription features when end-to-end encryption is enabled (since we can't decrypt media on our servers).

Are chat, data track, and plugins also end-to-end encrypted?

Not right now, but this should be available in the (very) near future.

Final thoughts

As we adopt end-to-end encryption, we take a big step towards better privacy and security in our digital lives. It's about ensuring our conversations stay private, reinforcing that our digital spaces should be safe for everyone. Moving forward with this technology means we're all playing a part in protecting our online communications.

Reach out to support@dyte.io to experiment and use this feature.

If you have any thoughts or feedback, please reach out to us on LinkedIn and Twitter. Stay tuned for more related blog posts in the future!

Get better insights on leveraging Dyte’s technology and discover how it can revolutionize your app’s communication capabilities with its SDKs.

Launching New Media Regions

Ninad Pundalik — Thu, 14 Nov 2024 15:13:00 GMT

Dyte provides tools to developers and organizations for integrating meetings, webinars, and live streams on mobile and web apps that are used globally. Live audio and video experiences are affected significantly by the time it takes to transmit data from one participant to another, i.e., latency.

To support our globally increasing base of customers, Dyte is launching additional regions that users can connect to and achieve lower latency, drastically improving the experience of everyone connected.

Previously, Dyte’s calls were serviced globally from four regions: North America - East, India, Singapore, and Europe - Frankfurt.

Starting today, we are adding four more media regions,

North America - West
South America
Africa
Europe - London

bringing Dyte’s presence to eight regions across various parts of the globe.

Meeting sessions are automatically routed to the closest region when the first user joins, and starting today, these new media regions will automatically be included in the routing process. Dyte also allows developers to choose a region for a particular meeting explicitly, and Dyte’s APIs and associated documentation will be updated shortly to support this additional set of regions.

Data Residency

Dyte provides data residency options in India, the US, and Europe. The addition of new regions will not change this list; the new regions will only route media, and the data processing regions will remain the same as before.

Custom Regions

We remain open to adding more media and data processing regions. Our customers can contact us for dedicated infrastructure in any specific AWS or Google Cloud region for a small fee.

We’re excited to help our customers better serve their users' needs and to continue to grow and enhance our services in various ways.

LL-HLS in Depth

Fenil Jain — Thu, 14 Nov 2024 12:32:00 GMT

Introduction

In the HLS in depth blog, we explored the core HLS protocol, how it works, its benefits, and its limitations. While discussing limitations, one notable point was higher latency. Reducing it is a tricky problem, and so efforts began in this direction. Today, we will explore how they led to the final evolution of LL-HLS and CL-HLS.

Terminology

Before we start, let's get our terminology straight. In this article, CL-HLS refers to Community HLS, also known as L-HLS. LL-HLS refers to Apple's low-latency HLS or AL-HLS. I have seen a lot of confusion around these (I had much more). We will use CL-HLS and LL-HLS for both of these independent low-latency solutions.

HLS

With traditional HLS (HTTP Live Streaming), we wait for a segment to be produced. Then, we copy it to some web-server/edge location, which then gets polled by the client to fetch them. For a 6 second segment, this would mean 6 second input wait + encoder costs + CDN fetch costs + client buffer, which is 3 segments, so 4 segments to start playing, i.e., 24 seconds. A notable point would be how this latency begins to add up in traditional HLS.

Before even starting to optimize, let's get a mental model around this:

What we don't focus to optimize

One would say we can start optimizing from source to encoder delivery. The catch is that the HLS Spec does not discuss this part of delivery so that it can be done in any way open to the implementor. The most famous route is RTMP, but we have seen WebRTC emerging as an excellent alternative. With OBS getting first-class support for it using WHIP, it is exciting to see how the ingest side of things evolves.

But for optimizations in HLS, we don't focus on ingest from a spec perspective. So, the next part is the encoder. Again, one can get the best encoder, and it's a complete field of its own to optimize them, which is also the reason the spec doesn't talk about them explicitly. So that is out of the picture.

But now everything after segments, leaving the encoder, is HLS territory (including the client) and is open for optimizations. So our problem statement is getting output from the encoder to glass (viewer's screen) as fast as possible.

Segment size optimization

If we revisit the segment flow mentioned above, we could just segment size smaller, less than a second maybe, and then keep sending them. But there's a catch — we always want each segment to be independently decodable. What do I mean by independently decodable? I will give a small, just enough primer on how the encoder works; everything above that is homework for you!

When talking about still pictures, we know one way they can be represented in memory — RGB. So, we store the RGB value for each pixel, and a nested array forms a picture. That's a lot of data points! Well, now let's say we are dealing with a stream of pictures and storing this representation for each of them. This would require a lot of memory. So, what we do is choose a base representation. We can call it I-Frame (keyframes). These are not directly RGB representations but with some smart compression. Now, whatever changes in the next frame, we take a smart difference from the I-Frame and store it as a P-Frame. P-Frames, hence, are dependent frames, and I-Frames are independent frames as they don't depend on anyone else. In Wikipedia's words:

I‑frames are the least compressible but don't require other video frames to decode. P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.

Back to our topic, each segment needs at least one I-Frame, a simple reason being that a viewer can join the stream at any time, and a peer starts getting the segment from that point in time; hence, that segment should be independent of the start of the stream. We discussed that I-Frames are heavy; thus, reducing segment size would cause keyframe generation to increase and the size of each segment to grow. Segment size directly corresponds to bandwidth usage, which would start choking on lower bandwidth devices.

We now understand why we can't just reduce the segment size mindlessly. But we do need to deliver smaller chunks of data to the client. What if we break this segment into smaller "parts," where each part is not required to have a keyframe, but if it does, we can mention it as "INDEPENDENT".

This way, the client only looks for parts that are "INDEPENDENT" and starts decoding from that point. It does not need to wait for the complete segment to begin playing. This also became possible due to Chunked CMAF and Partial TS. We mentioned these in our previous article. They are container-type formats, a way to store video/audio data. Getting support chunks/smaller parts allowed us to shrink normal container sizes into n number of chunks. The ideal size of these parts is 300-400ms.

This is the same method CL-HLS and LL-HLS use to reduce latency. The tag used to identify a partial segment is EXT-X-PART. If we check the spec, we find another interesting attribute, BYTERANGE, which is a way to indicate that the delivered part will be in this byte range of the complete segment.

Delivery

Okay, this seems reasonable, but how does the client get these parts? The client will have to fetch manifest to understand how to find them. However, as we keep generating more parts, we keep changing the manifest file really fast, so we would also have to fetch new changes quickly. To optimize this step, CL-HLS and LL-HLS took two different routes.

CL-HLS optimized it by pre-announcing segments, and when they were fetched, it used HTTP Chunked Transfer Encoding (CTE) to deliver parts continuously.

CTE is a method of sending data in chunks, so instead of letting the client know the total size of the payload, we keep sending chunks with the mentioned size, and when we are done, we send an empty chunk. This is usually employed in cases where the actual payload size is unknown.

One can understand why this is a neat technique. We are announcing yet-to-be-formed segments and leveraging the network round trip time to make parts and continuously deliver them using CTE. One point about using CTE is it makes bandwidth estimation harder.

LL-HLS took a different route. In their initial 2019 announcement, it seemed their strategy was to write segments to the manifest file if and only if generated. To compensate for low latency, they started pushing segments using HTTP/2 Push when the client requested a newer manifest. This saves a round trip time of the client reading the manifest, understanding it, and then requesting segments/parts.

Later, after taking feedback from the community, they modified the spec in 2020 to include a prefetch tag, which, like the CL-HLS version, allowed to pre-announce segments. So, HTTP/2 Push isn't a mandatory requirement anymore.

But the client still has to fetch these after getting the manifest; no CTE is involved. This does make a few things more bandwidth-consuming. A protocol with all Apple's changes and CTE for continuous delivery would be a perfect middle-ground, and maybe the community's ultimate goal was planning for low-latency HLS.

The tag used to pre-announce a segment is EXT-X-PRELOAD-HINT.

Playlist delta updates

In HLS, another problem we discussed was around big playlists. Let's say we have a long livestream going on, and fetching manifest listing segments from the very start until the end would make round-trip super heavy. Instead, we could get only the updates and deltas from the playlist at one point. Playlist delta updates enable this. It is an exciting feature, and hence, also ported back to HLS specification.

The EXT-X-SKIP tag is used as a marker to skip a manifest section. To request a delta update from the server, the client uses_HLS_skip=YES|v2 query param.

Blocking playlist updates

For CDNs, let's say we set cache TTL to 2 seconds. This would mean CDN will not serve newer segments produced every 300-400 ms till 2 seconds. We may need some kind of cache-busting and precise segment/part fetch mechanism. LL-HLS also provides this feature in the form of blocking playlist updates. We can tell the server not to break the connection and send a response until a particular segment/part is ready to be served.

This also helps client blocking until their set buffer size is not fulfilled; at this point, they can start decoding and displaying output instantly.

Blocking is obtained by usage of two query params in sync:

_HLS_msn=: Do not resolve the request until the M-numbered media sequence number is ready.
_HLS_part=: Do not resolve the request until the N-part of the M segment is ready. This tag needs the msn tag to be present.

Rendition reports

Another great feature added by LL-HLS is a faster way to do ABR using rendition reports. These reports contain the last media sequence number and the latest part. A rendition report is needed for each of the defined bitrates individually. EXT-X-RENDITION-REPORT is used to identify these reports.

LL-HLS example playlists

Apple documentation around LL-HLS includes examples of different playlist requests and their responses.

General low-latency playlist example for request style:

GET

Is this:

#EXTM3U
#EXT-X-TARGETDURATION:4
#EXT-X-VERSION:6
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=12.0
#EXT-X-PART-INF:PART-TARGET=0.33334
#EXT-X-MEDIA-SEQUENCE:266
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:13:36.106Z
#EXT-X-MAP:URI="init.mp4"
#EXTINF:4.00008,
fileSequence266.mp4
#EXTINF:4.00008,
fileSequence267.mp4
#EXTINF:4.00008,
fileSequence268.mp4
#EXTINF:4.00008,
fileSequence269.mp4
#EXTINF:4.00008,
fileSequence270.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.3.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.4.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.5.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.6.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.7.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.8.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.9.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.10.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.11.mp4"
#EXTINF:4.00008,
fileSequence271.mp4
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:14:00.106Z
#EXT-X-PART:DURATION=0.33334,URI="filePart272.a.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.b.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.c.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.d.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.e.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.f.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart272.g.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.h.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.i.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.j.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.k.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.l.mp4"
#EXTINF:4.00008,
fileSequence272.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart273.0.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart273.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart273.2.mp4"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePart273.3.mp4"

#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=2
#EXT-X-RENDITION-REPORT:URI="../4M/waitForMSN.php",LAST-MSN=273,LAST-PART=1

We can see some familiar tags from the HLS post and some new ones we learned in this article. Part duration is defined using tag #EXT-X-PART-INF:PART-TARGET=0.33334. For server control params, we see #EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=12.0. This has a couple of instructions. Let's understand them one by one:

CAN-BLOCK-RELOAD: To block reload until 273 media sequence number and its 2nd part is not ready.
Then PART-HOLD-BACK, do not play until 'n', 1.0 here; parts are unavailable.
And finally, CAN-SKIP-UNTIL, the server can skip a part of the playlist if the client requests it. The value for the last tag is in seconds and must be at least six times the target duration; hence, we have it as 12 here.

Actual parts are listed using #EXT-X-PART. It has DURATION and URI fields, which are pretty self-explanatory. There's also an INDEPENDENT tag at the end of some parts. These represent the I-Frames we discussed earlier and hence can be used as a point from which the client can start decoding.

Near the bottom, we can see EXT-X-PRELOAD-HINT. It mentions it's hinting for a part using TYPE=PART and then the exact URI to fetch it.

And lastly, we have rendition reports mentioning the last MSN and part index along with URI.

There's also an example of playlist delta update, which is requested using a URL like:

GET  &_HLS_skip=YES

#EXTM3U
# Following the example above, this Playlist is a response to: GET  &_HLS_skip=YES
#EXT-X-TARGETDURATION:4
#EXT-X-VERSION:9
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=12.0
#EXT-X-PART-INF:PART-TARGET=0.33334
#EXT-X-MEDIA-SEQUENCE:266
#EXT-X-SKIP:SKIPPED-SEGMENTS=3
#EXTINF:4.00008,
fileSequence269.mp4
#EXTINF:4.00008,
fileSequence270.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.3.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.4.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.5.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.6.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.7.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.8.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.9.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.10.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.11.mp4"
#EXTINF:4.00008,
fileSequence271.mp4
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:14:00.106Z
#EXT-X-PART:DURATION=0.33334,URI="filePart272.a.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.b.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.c.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.d.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.e.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.f.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart272.g.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.h.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.i.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.j.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.k.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.l.mp4"
#EXTINF:4.00008,
fileSequence272.mp4
#EXT-X-PART:DURATION=0.33334,URI="filePart273.0.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart273.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart273.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart273.3.mp4"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePart273.4.mp4"

#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=3
#EXT-X-RENDITION-REPORT:URI="../4M/waitForMSN.php",LAST-MSN=273,LAST-PART=3

We have a new query param of _HLS_SKIP to indicate that part of the playlist can be skipped. Then we have #EXT-X-SKIP:SKIPPED-SEGMENTS=3 to mention how many segments we have skipped in this playlist update.

They also have an example of a playlist that contains byterange-addressed parts:

# In these examples only the end of the Playlist is shown.
# This is Playlist update 1
#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs271.mp4",BYTERANGE-START=61000

# This is Playlist update 2
#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="19000@61000"
#EXTINF:4.08,
fs271.mp4
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs272.mp4",BYTERANGE-START=0

# This is Playlist update 3
#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="19000@61000"
#EXTINF:4.08,
fs271.mp4
#EXT-X-PART:DURATION=1.02,URI="fs272.mp4",BYTERANGE="21000@0"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs272.mp4",BYTERANGE-START=21000

Notice using the BYTERANGE attribute at the end of each EXT-X-PART tag. Another attractive attribute is BYTERANGE-START in EXT-X-PRELOAD-HINT tag, that is, to mark the start of the byte range of part in the segment.

Final thoughts

The need for low-latency solutions arose soon after the release of HLS, and we went from no solutions to a couple of competing solutions in no time. Watching different approaches and thought processes behind the same problem optimization is undoubtedly fascinating. Today, we have LL-HLS as the leading standard for low-latency live streaming. Amazon/Twitch evolved CL-HLS independently and used a proprietary implementation that seems to be performing really well!

Scaling low-latency solutions is definitely challenging, and that's why we at Dyte handle that for you. Feel at home with that sweet DX and leave all the complexity to us; try our Livestreaming SDK today!

Get better insights on leveraging Dyte’s technology and discover how it can revolutionize your app’s communication capabilities with its SDKs.

Building a Fast IP Location Service

Ravindra Singh Rathor — Mon, 11 Nov 2024 06:30:00 GMT

At Dyte, we provide white-labeled audio/video SDKs to facilitate meetings, webinars, and live streams for our clients to integrate into their mobile and web apps to be used globally.

When participants join a meeting, Dyte allocates a media server to the meeting in the region nearest to these participants to ensure low latency and a smoother meeting experience. To figure out the closest region to use, we need to determine the location of these participants. The location need not be point precise; even if we have a rough idea of state/county/province/district, we can make a good enough decision.

To figure out the location, aka latitude and longitude, we were earlier receiving the IP of the caller from the network request from the client and passing this IP to third-party providers that keep track of IPs against latitude and longitude. Since these IPs keep changing, we went ahead with one such third-party location service provider to focus on our core offering of audio/video SDK rather than maintaining one more side project.

Pros of using third-party service:

In-depth details of IP: These location services gave not just the position but also the city, country, company name, autonomous system number (ASN), and many more.
Near Zero maintenance solution: We wouldn't have to maintain the database, scale, and frequent DB updates. API was uncomplicated and simple to upgrade if needed.

Cons of using third-party services:

Latency: Most of the time, this extra information causes the latency to reach beyond 500ms.
Third-party downtimes: Since fetching location was crucial, any downtime in the location service would mean downtime at Dyte, which was unacceptable.
Cost: We didn't need most of the data, but we were still getting it, resulting in extra cost.
Poor performance for some edge locations: Some edge locations were more prone to slow responses than others.

So, after months of facing these issues, we decided to take matters into our own hands and make our location service — that would have extremely low latency and lower costs with zero downtimes.

We had two approaches to create a new location service:

Purchase or download free IP details, store them in DB and expose an endpoint to query.
Think outside the box.

Since purchasing IP details and storing and maintaining them was going to be a lot of pain, we returned to our drawing board to see what else could be done if we were to think outside the box.

Introducing Cloudfront headers

After looking for alternate solutions everywhere, we came across CloudFront's request headers.

Since CloudFront already has the database and uses it to populate request headers, we wondered why not somehow use these headers to retrieve city, country, and location, among other things. Whatever we needed was there virtually free of cost, and latency was under 50ms compared to 500ms - 1 second.

The next step was to make the code return these headers work with edge locations without hitting any origin server. Since CloudFront, by its nature, supports edge locations and CloudFront functions, this step was already solved. All we needed was a small piece of code in CloudFront functions.

Steps to create location service CloudFront distribution

Here are the detailed steps to create a location service distribution yourself.

Create a Fake origin

We need to create a Fake AWS S3 bucket as an origin to keep CloudFront happy. Since CloudFront must have an origin, we had to give it one. This AWS S3 bucket will never be hit, so don't worry about S3 hits and the associated costs. You can go to AWS S3 and create a new S3 bucket manually.

Create a CloudFront function

Create a CloudFront function, put this simple code, and publish it.

function handler(event) {
    var headers = event && event.request && event.request.headers || {};
    function getHeaderValue(headerName){
        return headers && headers[headerName] && headers[headerName].value;
    }
    var response = {
        statusCode: 200,
        statusDescription: 'OK',
        headers: {
            'cloudfront-functions': { value: 'generated-by-CloudFront-Functions' },
            'access-control-allow-origin': { value: '*'},
            'access-control-allow-headers': { value: '*' },
            'access-control-allow-methods': { value: 'GET,OPTIONS'},
            'timing-allow-origin': { value: '*'}
        },
        body: JSON.stringify({
              city: getHeaderValue('cloudfront-viewer-city'),
              country: getHeaderValue('cloudfront-viewer-country'),
              region: getHeaderValue('cloudfront-viewer-country-region-name'),
              loc: (getHeaderValue('cloudfront-viewer-latitude') || '') + "," + (getHeaderValue('cloudfront-viewer-longitude') || ''),
              timezone: getHeaderValue('cloudfront-viewer-time-zone'),
              ip: (getHeaderValue('cloudfront-viewer-address') || '').split(":").slice(0, -1).join(":"),
              postal: getHeaderValue('cloudfront-viewer-postal-code'),
        })
    };
    return response;
}

Create a Distribution

While creating a CloudFront distribution, link the S3 bucket to it as the origin. Select the path pattern of your liking or leave it as default (*), and select desired HTTP methods. In Caching policies, set the following,

Map the previously created CloudFront functions as Viewer Request in function associations.

Once you are done making this distribution, test it out. You would get a URL similar to somerandomcharacters.cloudfront.net. Opening this link will show your IP details as follows.

{
  "city": "Gunzenhausen",
  "country": "DE",
  "region": "Bavaria",
  "loc": "49.11560, 10.75110",
  "timezone": "Europe/Berlin",
  "ip": "2a01:4f8:c0c:c129::1",
  "postal": "91710"
}

If you have come so far, the next step is to link Route 53 to this CloudFront distribution to have a sane-looking domain name such as ipdetails.yourwebsite.com. If you are not using Route 53, use the tool of your choice. It is not a must.

User Flow

Overall, the user flow will look like the following.

Voila, you now have the IP info solution that gives results under 10ms. Below is an instance of it delivering IP details in 6ms!

There is no need to maintain and scale DBs or purchase anything. You now have a simple near-zero maintenance location service distribution system that is extremely fast. It is not going to go down easily; AWS can attest to that.

Though these are minimal improvements in terms of cost for many organizations to even take up, they are huge in reducing latency. Collectively, they make Dyte's customer experience better with time.

I hope you found this post informative and engaging. If you have any thoughts or feedback, please contact me on Twitter or LinkedIn. Stay tuned for more related blog posts in the future!

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. You can reach us at support@dyte.io or ask our developer community.

AI-Powered Audio Transcriptions

Rohan Mukherjee — Fri, 08 Nov 2024 10:28:00 GMT

Update: This feature is out of beta and in general availability.

At Dyte, we are committed to revolutionizing real-time communication, and we're thrilled to announce an exciting addition to our platform: AI-powered audio transcriptions. This feature is designed to enhance your communication experience by effortlessly converting spoken words into written text.

We're introducing an efficient and accurate way to transcribe your conversations in meetings. We provide transcriptions in 2 forms:

Live transcriptions: The transcripts can be consumed on the client side using the Dyte SDK that's suitable for your platform. These transcripts are generated on the server in real-time.
Post-meeting webhooks: The meeting transcript can be consumed via a webhook after the meeting ends.

This release marks the step in a journey toward providing you with cutting-edge AI capabilities. We’re also in the process of developing features to support other AI features like meeting agenda generation and meeting summarization.

Usage

Dyte's transcription APIs likely offer a programmatic way for developers to integrate transcription capabilities into their applications or services. To know more about the transcription APIs in detail, check out this guide.

As always, making sure that our features are developer-friendly is our top priority. Thus, we provide a very simple-to-use API in our client SDKs for you to be able to consume real-time audio transcriptions. Here’s an example of how to use it in our web core SDK.

meeting.ai.on('transcript', (transcriptionData) => {
    console.log(transcriptionData);
});

The transcriptionData object consists of the following information:

An ID to uniquely identify the transcript
The name of the speaker
The ID of the speaker
The transcribed speech
A timestamp of when the speaker had spoken

The transcriptionData object can be represented with the help of the following interface.

export interface Transcript {
  id: string;
  name: string;
  peerId: string;
  transcript: string;
  date: Date;
}

The meeting.ai object emits transcripts only when it’s enabled in the preset of the participant who is speaking. To learn more about how to enable this feature for a participant, check out the transcription guide.

Key Features

Dyte's transcriptions offer a robust suite of features, as mentioned below.

Real-Time Accuracy: Our AI engine provides instant and accurate audio transcriptions, ensuring you stay in sync with the conversation.
Speaker Identification: Easily identify speakers with our speaker attribution feature, making it clearer who said what.
Searchable Transcripts: Search through the transcript to quickly locate specific points in the conversation, streamlining post-meeting analysis.

Pricing

Our AI-driven audio transcription service is priced at $0.015 per minute, offering precise and rapid transcription for any volume of content. This straightforward rate ensures transparent billing for all your transcription needs.

Kotlin MPP: Concurrency

Yash Garg — Thu, 07 Nov 2024 08:56:00 GMT

At Dyte, we leverage Kotlin Multiplatform across our products to keep our codebase consistent and maintainable. It allows us to share code across platforms, including Android, iOS, and the web, while still providing the flexibility to write platform-specific code when needed.

The need for structured concurrency

Structured concurrency allows doing multiple computations outside the UI-thread to keep the app as responsive as possible. It differs from concurrency in the sense that a task can only run within the scope of its parent, which cannot end before all of its children. This ensures that all tasks are properly managed and cleaned up, preventing memory leaks and other issues.

Kotlin Multiplatform provides an easier way to handle concurrency using it's kotlinx.coroutines library. It allows developers to write asynchronous code in a more readable and maintainable way, making it easier to handle complex scenarios.

The problem

Let us launch two-hundred coroutines all doing the same action two-thousand times. The task is to increment a shared counter. The counter is a simple integer variable.

The code is as follows:

suspend fun hugeRun(task: suspend () -> Unit) {
    val i = 200  // number of coroutines to launch
    val k = 2000 // times an action is repeated by each coroutine

    coroutineScope { // scope for coroutines
        repeat(i) {
            launch {
                repeat(k) { task() }
            }
        }
    }

    println("Completed ${i * k} actions")
}

What does it print at the end? It is highly unlikely to ever print "Counter = 400000", because two hundred coroutines increment the counter concurrently from multiple threads without any synchronization.

Approaches to handle concurrency

There is a common misconception that making a variable volatile solves concurrency problem. However, that only guarantees visibility of changes to other threads, but does not provide atomicity.

This means that if two threads read the value of a volatile variable at the same time, they may both see the same value, and both increment it, leading to a lost update.

So, how do we solve this problem?

We will explore three different approaches to handle this:

Using Atomic Primitives for Shared State Concurrency

The general solution that works both for threads and for coroutines is to use a thread-safe (aka synchronized, linearizable, or atomic) data structure that provides all the necessary synchronization for the corresponding operations that needs to be performed on a shared state. In the case of a simple counter we can use AtomicInteger class which has atomic incrementAndGet operations:

import kotlinx.atomicfu.*

fun main() {
    // Create an atomic integer
    val counter = atomic(0)

    // Launch multiple coroutines to increment the counter concurrently
    repeat(100) {
        GlobalScope.launch {
            // Atomically increment the counter
            counter.incrementAndGet()
        }
    }

    // Wait for all coroutines to complete
    Thread.sleep(100)

    // Print the final value of the counter
    println("Counter value: ${counter.value}")
}

Using Threads for Shared State Concurrency

To handle concurrency using threads, you can create multiple threads and synchronize access to shared state using locks or other synchronization mechanisms. Here's an example of how to increment a shared counter using threads:

fun main() {
    // Create a shared counter
    var counter = 0

    // Launch multiple threads to increment the counter concurrently
    repeat(100) {
        Thread {
            // Increment the counter
            counter++
        }.start()
    }

    // Wait for all threads to complete
    Thread.sleep(100)

    // Print the final value of the counter
    println("Counter value: $counter")
}

Using Coroutines for Asynchronous Operations

We can also use CoroutineScope to launch multiple coroutines to perform asynchronous operations concurrently. Here's an example of how to increment a shared counter using coroutines:

import kotlinx.coroutines.*

fun main() {
    // Define a coroutine scope
    runBlocking {
        // Launch a coroutine to perform a background task
        val job = launch {
            val result = async {
                // Simulate a long-running operation
                delay(1000)
                "Hello, KMP!"
            }
            // Wait for the result and print it
            println(result.await())
        }
        // Do other work concurrently
        println("Loading...")
        // Wait for the coroutine to complete
        job.join()
    }
}

Platform-Specific Threading Models

Kotlin Multiplatform allows you to write platform-specific code using the expect and actual keywords. This enables you to leverage platform-specific threading models to handle concurrency in a platform-agnostic way.

Here's an example of how to use platform-specific threading models to perform tasks on the main UI thread in Android and iOS:

Note: We can also use kotlinx.coroutines library to handle concurrency in a platform-agnostic way. It is not necessary to use platform-specific threading models, but they can be useful in certain scenarios.

Platform-Specific Threading Models (Android)

import android.os.Handler
import android.os.Looper

fun main() {
    // Create a handler associated with the main UI thread
    val mainHandler = Handler(Looper.getMainLooper())

    // Post a task to the main UI thread
    mainHandler.post {
        // Update UI or perform other operations on the main thread
        println("Task executed on the main UI thread (Android)")
    }
}

Platform-Specific Threading Models (iOS)

import Foundation

func main() {
    // Perform a task on the main UI thread (iOS)
    DispatchQueue.main.async {
        // Update UI or perform other operations on the main thread
        print("Task executed on the main UI thread (iOS)")
    }
}

Thread Safe Lists

Apart from the above approaches, we at Dyte use custom implementations of lists with thread safety to handle concurrency in our applications. We use two types of lists: WriteHeavyMutableList and ReadHeavyMutableList with the help of Locks provided by kotlinx.atomicfu like ReentrantLock.

// Short implementation of WriteHeavyMutableList
import kotlinx.atomicfu.*

internal class WriteHeavyMutableList {
  val lock = reentrantLock()
  val intList = mutableListOf()

  ...

  // The lock is used to synchronize access to the list
  // ensuring that only one thread can read or write to the list at a time.
  fun get(index: Int): T = lock.withLock { intList[index] }

  ...
}

This allows us to safely access and modify the list from multiple threads without the risk of data corruption or issues like concurrent modification exceptions.

Final Thoughts

Concurrency is a complex topic that requires careful consideration when designing and implementing multiplatform applications. By understanding the strengths and weaknesses of each approach, you can choose the best solution for your specific use case, as we at Dyte do.

We hope you found this post informative and engaging. If you have any thoughts or feedback, please reach out to us on LinkedIn and Twitter. Stay tuned for more related blog posts in the future!

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. You can reach us at support@dyte.io or ask our developer community.

Monitoring Web API Performance

Ravindra Singh Rathor — Thu, 07 Nov 2024 06:58:00 GMT

Optimal performance is crucial in the realm of web development. Users demand a smooth and responsive experience, and any latency can lead to frustration and abandonment. Thankfully, modern browsers equip developers with powerful tools to monitor and enhance performance.

One such tool is the PerformanceObserver API, a JavaScript interface enabling the observation of performance-related events. This guide comprehensively explores the PerformanceObserver API, specifically focusing on its application in tracking API performance.

While monitoring API performance on the server side is essential, it doesn't tell the whole story. What matters most is how long it takes for users to see results. Server metrics might show everything running smoothly, but if there's network lag or delays in processing user requests, the user experience suffers.

Tracking performance from the user's perspective helps identify real-world slowdowns that can lead to frustration and, ultimately, user abandonment of your application.

Understanding the PerformanceObserver API

The PerformanceObserver API builds upon the foundation of the broader Performance API. While the Performance API offers access to valuable performance-related information like timing metrics and navigation data, it often presents this data as a snapshot. The PerformanceObserver API elevates performance monitoring by enabling real-time observation and response to performance events. This empowers developers to delve deeper into various performance metrics, including resource timing, user navigation patterns, and server response times, all with a focus on providing a more granular and user-centric view of website performance.

Basic usage

Using the PerformanceObserver API typically involves creating an observer object and specifying which performance entry types to observe. Here's a basic example.

const apiPerformanceObserver = new PerformanceObserver((list) => {
  list.getEntries()?.forEach((entry) => {
        console.log('performanceEntry:: ', entry);
  });
});
apiPerformanceObserver.observe({ type: 'resource', buffered: true });

To try this out, head over to Dyte Demo. Once the page is loaded, paste the above code snippet into the developer console and hit enter. You will instantly see the buffered entries.

Provide any username and random meeting name and click on Start Meeting. You will see even more performance entries being console-logged.

Filtering out XHR & Fetch

If you only care about XHR & Fetch, you can filter those initiatorType out.

const apiPerformanceObserver = new PerformanceObserver((list) => {
  list.getEntries()?.forEach((entry) => {
    const performanceEntry = entry.toJSON();
    if (
      ['xmlhttprequest', 'fetch'].includes(performanceEntry.initiatorType)
    ) {
        console.log('performanceEntry Filtered:: ', entry);
    }
  });
});
apiPerformanceObserver.observe({ type: 'resource', buffered: true });

Notice the if condition, to filter out entries, based on the initiatorType. To learn more about initiatorType, please refer to this.

If you expand some of these entries, you will realize that nearly all have 0 as values for their keys, except the duration key, which is not ideal since we want the actual values. Let's discuss why this is happening and how to fix it.

Fixing data for cross-origin calls

Many resource timing properties are restricted to return 0 or an empty string when the resource is a cross-origin request. This is one of the security practices while dealing with CORS.

Since on Dyte Demo, Dyte is NOT sending the Timing-Allow-Origin response Header, you see most values as 0. This can be verified using the Network Tab.

We are sending a Timing-Allow-Origin response header for one of the endpoints, location.dyte.io. The Performance Entry for this endpoint will have proper values.

(To learn more about this, check out our blog on building a fast IP location service.)

Since this happens for CORS requests only, To expose cross-origin timing information, we need to send the Timing-Allow-Origin HTTP response header using the backend codebase; only then can FE see the timing in the FE code.

You can add the Timing-Allow-Origin response header for your project in the following ways.

A NextJS config in the backend to return CORS response headers and Timing-Allow-Origin to every incoming call looks like this.

const nextConfig = {
    async headers() {
        return [
            {
                // matching all API routes
                source: "/api/:path*",
                headers: [
                    { key: "Access-Control-Allow-Credentials", value: "true" },
                    { key: "Access-Control-Allow-Origin", value: "*" },
                    { key: "Access-Control-Allow-Methods", value: "GET,DELETE,PATCH,POST,PUT" },
                    { key: "Access-Control-Allow-Headers", value: "X-CSRF-Token, X-Requested-With, Accept, Accept-Version, Content-Length, Content-MD5, Content-Type, Date, X-Api-Version" },
                    { key: "Timing-Allow-Origin", value: "*" },
                ]
            }
        ]
    }
};

For ExpressJS, it could be something like the following code snippet in a middleware.

app.use(function(req, res, next) {
    res.header("Access-Control-Allow-Credentials", "true");
    res.header("Access-Control-Allow-Origin", "*");
    res.header("Access-Control-Allow-Methods", "GET,DELETE,PATCH,POST,PUT");
    res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept");
    res.header("Timing-Allow-Origin", "*");
    next();
});

Please refer to the security requirements here to learn more about this security constraint.

To send even more timing information (BE timing information such as DB fetch and cache Hit) to FE, you can pass the timing information using the Server Timing API.

Sample PerformanceResourceTiming Entry

If you have configured the response headers properly, you will see proper values for all the keys.

Here is one such sample.

{
    "name": "",
    "entryType": "resource",
    "startTime": 1041.699999999255,
    "duration": 61,
    "initiatorType": "fetch",
    "deliveryType": "",
    "nextHopProtocol": "h2",
    "renderBlockingStatus": "non-blocking",
    "workerStart": 0,
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 1041.699999999255,
    "domainLookupStart": 1071.199999999255,
    "domainLookupEnd": 1071.199999999255,
    "connectStart": 1071.199999999255,
    "secureConnectionStart": 1081.199999999255,
    "connectEnd": 1091.6000000014901,
    "requestStart": 1091.699999999255,
    "responseStart": 1102.3999999985099,
    "firstInterimResponseStart": 0,
    "responseEnd": 1102.699999999255,
    "transferSize": 444,
    "encodedBodySize": 144,
    "decodedBodySize": 144,
    "responseStatus": 200,
    "serverTiming": []
}

Making sense of the Performance Timing data

The above performance timing data is a snapshot for a resource. You can get typical resource timing metrics out of these entries/snapshots.

Measuring TCP handshake time (connectEnd - connectStart)
Measuring DNS lookup time (domainLookupEnd - domainLookupStart)
Measuring redirection time (redirectEnd - redirectStart)
Measuring interim request time (firstInterimResponseStart - requestStart)
Measuring request time (responseStart - requestStart)
Measuring TLS negotiation time (requestStart - secureConnectionStart)
Measuring time to fetch (without redirects) (responseEnd - fetchStart)
Measuring ServiceWorker processing time (fetchStart - workerStart)
Checking if content was compressed (decodedBodySize should not be encodedBodySize)
Checking if local caches were hit (transferSize should be 0)
Checking if modern and fast protocols are used (nextHopProtocol should be HTTP/2 or HTTP/3)
Checking if the correct resources are render-blocking (renderBlockingStatus)

Finally, you have this data in frontend. The next step is to send it to your BE endpoint to store it somewhere, e.g., NewRelic, DataDog, or DB, which we leave to you.

Performance API helped Dyte determine the timings of its new services, such as location.dyte.io, to see how well they were performing globally.

You can learn more about PerformanceResourceTiming here. To learn more about Performance API, please refer to these docs.

We believe these Timing metrics will also help you test and improve your services.

Lastly, I hope you found this post informative and engaging. If you have any thoughts or feedback, please get in touch with me on Twitter or LinkedIn. Stay tuned for more related blog posts in the future!

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. You can reach us at support@dyte.io or ask our developer community.