Add Transcription to Your Video Calls With AWS Transcribe

With the rise of remote job opportunities and geographically dispersed teams, effective communication is more crucial than ever. It is essential that there are no gaps in understanding among team members from diverse locations, ensuring everyone feels included and valued.

In a previous blog, we explored the seamless integration of an audio transcriptions and translation solution using Google STT (Speech to Text) in conjunction with Dyte. This powerful combination showcased its potential to foster improved communication among teams that are geographically dispersed and culturally diverse in modern settings, catering to various races, ethnicities, and cultural backgrounds.

Now, we have devised an AWS (Amazon Web Services) based solution that overcomes not only the barriers between people but also those between cloud providers. This new solution offers efficient transcriptions and translations, further enhancing cross-cultural collaboration and communication. Azure, you're up next for our exploration!

In this blog, we will discuss the integration of a third-party service from Amazon Web Services named AWS Transcribe with Dyte. As the participants speak, it will add transcription to audio/video meetings. These transcriptions, with the help of AWS translate, another service from AWS, can then be converted to the desired language of individual participants, as needed.

0:00

/0:01

As usual, there are prerequisites.

Prerequisites

Before we dive into the process, it's important to note that you should have a Dyte meeting already created to add a transcription of the audio of the meeting's participants.

Here are some useful resources for your quick navigation:

Make sure you've read the Getting Started with Dyte topic and completed the following steps:

Create a Dyte Developer Account
Create Presets
Create a Dyte Meeting
Add Participant to the meeting
Integrate with the SDK of your choice

Assuming you have successfully integrated Dyte into your website or application, let's dive into the process of adding transcriptions in the Dyte meetings.

Adding transcription components

To achieve the audio transcriptions, we need to call AWS (Amazon Web Services) Services: AWS Transcribe & AWS Translate.

These AWS services are paid. Please note that Dyte does NOT act as a broker between AWS and any of our clients that wants to integrate AWS services with Dyte SDK. We recommend that our clients should directly connect with the AWS team.

However, we at Dyte, provide free of cost code samples so that our clients can seamlessly integrate with such services.

Integration Steps

1. Setup AWS IAM account, get credentials

To use AWS services, you should either be an IAM user, or your backend services should be deployed using roles & policies.

For ease of use, we are going ahead with IAM user credentials. Please ensure that the IAM user can actually use AWS transcribe & AWS translate services.

To proceed further, we need accessKeyId, secretAccessKey & the preferred region where you would want your services to be deployed in the future.

Once you have these, let’s proceed to the next step.

2. Setup a backend server to call these APIs securely

If you have used AWS services before, you would know that these services are costly, they can charge you a lot. Obviously, you would not want to expose your credentials for miscreants to exploit.

If you put these credentials by any chance in your frontend code, anyone would be able to get a hold of this and land you with a huge sum of charge. Also, if you expose an endpoint that is not performing checks to validate users and blindly providing these services to anyone with an API endpoint link, it would be a potential security risk.

Therefore we will not expose these credentials. We will put them in your backend services for safer usage. This would give you better control to ensure that no unauthorized personnel can use your APIs & secrets.

For this, we have provided the BE sample in NodeJS for you to see. Please find it here. Currently, we only have an ExpressJS sample, if you working on a different backend, feel free to port this code or connect with us so that we can help you out in porting it.

To use this sample, Please clone this using the following command.

git clone git@github.com:dyte-in/aws-transcribe.git

Or simply download it as a Zip. Click on Code (Green Icon → Download ZIP) and extract once downloaded.

Now that you have downloaded the repository. Please proceed with the following steps to set up your backend endpoint.

2.1 Go to the server folder

cd server

2.2 Replicate .env.example as .env

cp .env.example .env

Open the .env in your choice of Text File Editor. Edit it as per your AWS service account credentials and Save it.

2.3 Install NPM packages

This would install @aws-sdk/client-translate automatically. Instead of @aws-sdk/client-transcribe-streaming, we will be using WebSocket with presigned urls.

npm install

2.4 Run the server

npm run dev

If successful, you would see the confirmation in Terminal that it is running on localhost:3001 or the PORT specified in the .env file.

This endpoint (https://localhost:3001) will be called the backend_url from now onwards. We advise you to copy the code from https://github.com/dyte-in/aws-transcribe/blob/main/server/src/index.ts and alter it as per your need and security practices.

We strongly advise you not to use this code as is, without a proper security mechanism, for production.

3. Integrate the SDK in the Front end project

Now that the backend setup is ready let’s start with the frontend setup.

3.1 Install the FE package in your FE codebase

The first action would be to install Dyte’s wrapper package to take out complexity from you in front-end integration using:

npm install @dytesdk/aws-transcribe

if you are interested in knowing what this package does behind the scene, and this is something we recommend you to check out, refer to https://github.com/dyte-in/aws-transcribe/tree/main/client so that you can further customize the code to your liking. This is necessary as well so that you can put proper security measures. It is advised to copy-paste the files of the client folder and alter them instead of using this package directly in production.

3.2. Integrate the AWS Transcribe SDK with Dyte Meeting

The second step is to look for the place in your codebase where you are initiating a Dyte meeting.

Once you have found the place and got a hold of the meeting object. Add the following code on top of the file to import the FE SDK.

import DyteAWSTranscribe from '@dytesdk/aws-transcribe';

Add the following code just after the point where you have access to the meeting object.

const awsTranscribe = new DyteAWSTranscribe({
		meeting,
    target: 'hi', // Optional if translate is false, Supported languages: https://docs.aws.amazon.com/translate/latest/dg/what-is-languages.html
    translate: true, // Control whether to translate the source language to target language or just transcribe
    source: 'en-US', // Supported languages: https://docs.aws.amazon.com/translate/latest/dg/what-is-languages.html
    preSignedUrlEndpoint: 'http://localhost:3001/aws-transcribe-presigned-url',
    translationEndpoint: 'http://localhost:3001/translate', // ${backend_url}/translate. backend_url is from step 2.4
});

awsTranscribe.on('transcription', async (data) => {
    // ... do something with transcription
});

awsTranscribe.transcribe();

Here you are setting up the DyteAWSTranscribe with the values that the current user would prefer and activating the recognition just afterward using awsTranscribe.transcribe(). Then we are listening to every new transcription using awsTranscribe.on(’transcription’, aJsCallbackFunction)

To see the support languages, please refer to https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html and https://docs.aws.amazon.com/translate/latest/dg/what-is-languages.html.

With this, now you would be able to receive the live transcriptions. Feel free to put them in UI as per your need.

If you need a sample of how to do it, please refer to https://github.com/dyte-in/aws-transcribe/blob/main/client/demo/index.ts.

<!DOCTYPE html>
<html lang="en">
   <head>
      <meta charset="UTF-8" />
      <meta http-equiv="X-UA-Compatible" content="IE=edge" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      <title>Google Transcriptions Demo </title>
      <!-- UI Kit -->
      <script type="module">
         import { defineCustomElements } from 'https://cdn.jsdelivr.net/npm/@dytesdk/ui-kit/loader/index.es2017.js';
         defineCustomElements();
      </script>
      <!-- Web Core -->
      <script src="https://cdn.dyte.in/core/dyte.js"></script>
      <script src="https://cdn.jsdelivr.net/npm/@dytesdk/aws-transcribe@0.0.1/dist/index.umd.js"></script>
      <style>
         body {
         height: 100vh;
         width: 100vw;
         }
         #dyte-transcriptions{
         position: absolute;
         z-index: 99999;
         bottom: 15%;
         width: 100%;
         display:flex;
         flex-direction: column;
         justify-content: center;
         align-items:center;
         }
         .dyte-transcription-line{
         display: block;
         max-width: 80%;
         text-align: center !important;
         }
         .dyte-transcription-speaker{
         font-weight: 500;
         color: orange;
         }
         .dyte-transcription-text{
         color: white;
         }
      </style>
   </head>
   <body>
      <dyte-meeting id="my-meeting"></dyte-meeting>
      <div id="dyte-transcriptions"></div>
      
      <script>
         const roomName = ''; // Put roomName if using v1 APIs. Leave Blank for v2 APIs
         const authToken = ''; // Put v1 or v2 participant auth token
         
         async function initMeeting() {
           const meetingEl = document.getElementById('my-meeting');
         
           const meeting = await DyteClient.init({
             authToken,
             roomName,
             defaults: {
               audio: false,
               video: false,
             },
           });
         
           meetingEl.meeting = meeting;
         
           const awsTranscribe = new DyteAWSTranscribe({
               target: 'hi', // Optional if translate is false, Supported languages: Supported languages: https://docs.aws.amazon.com/translate/latest/dg/what-is-languages.html
						   translate: true, // Control whether to translate the source language to target language or just transcribe
					     source: 'en-US', // Supported languages: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
					     preSignedUrlEndpoint: 'http://localhost:3001/aws-transcribe-presigned-url',
					     translationEndpoint: 'http://localhost:3001/translate',
           });
         
         
           awsTranscribe.on('transcription', async () => {
               const transcription = document.getElementById('dyte-transcriptions');
               const list = awsTranscribe.transcriptions.slice(-3);
               transcription.innerHTML = '';
               list.forEach((item) => {
                   const speaker = document.createElement('span');
                   speaker.classList.add('dyte-transcription-speaker');
                   speaker.innerText = `${item.name}: `;
         
                   const text = document.createElement('span');
                   text.classList.add('dyte-transcription-text');
                   text.innerText = item.transcript.trim() !== '' ? item.transcript : '...';
         
                   const container = document.createElement('span');
                   container.classList.add('dyte-transcription-line');
                   container.appendChild(speaker);
                   container.appendChild(text);
         
                   transcription.appendChild(container);
               });
           });
         
           awsTranscribe.transcribe();
         }
         
         window.onload = async function () {
           initMeeting();
         };
      </script>
   </body>
</html>

Here's a recorded demo showing how the audio transcription will appear in your Dyte Meetings.

0:00

/0:07

That’s it. You have successfully added transcriptions and custom language translations to your video calling app using AWS Transcribe.

I hope you found this post informative and engaging. If you have any thoughts or feedback, please get in touch with me on Twitter or LinkedIn. Stay tuned for more related blog posts in the future!

If you haven't heard about Dyte yet, head over to dyte.io to learn how we are revolutionizing communication through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes, which renew every month. If you have any questions, you can reach us at support@dyte.io or ask our developer community.