Link to original video by Sarge

Text to Speech with AWS Polly in Unity!

Introduction

  • Video is about using AWS Polly for text-to-speech in Unity.

"Hello everyone and welcome to this video."

Background about the video

  • This is one of the most requested features by viewers.

  • The video builds upon previous videos about text and audio processing.

"This is going to be one of the most requested features since I started working on the smart MPC pipeline videos with OpenAI and Unity."

The importance of text-to-speech

  • Text-to-speech is a crucial building block for the overall process.

  • Previous steps focused on turning text into text and audio into text.

  • Now, the focus is on turning text into audio for NPCs.

"This is the final block that we get text and turn it into audio so our NPCs can actually speak just like we talk to them."

Available text-to-speech services

  • Popular services include IBM Watson, Microsoft Azure Cognitive Speech, and Amazon Polly.

"When it comes to text-to-speech, there are quite many services. Most known are, of course, IBM Watson, Microsoft Azure Cognitive Speech services, and Amazon Polly."

Reasons for choosing Amazon Polly

  • Amazon Polly has good .NET support.

  • The creator already has access to Amazon Web Services and pays for it.

  • Azure and IBM services require additional information and payment details.

"For this video, I'm gonna go with Amazon Polly because it has good .NET support and it is one of the services I already have access to. Azure and IBM are also paid services. Even though they give you a certain amount of free usage, we will need credit card information, phone number, and such. So there's a certain barrier to go to the other side to be able to utilize this. I already had Amazon access, and I was able to use its .NET SDKs, so this is the reason I want to go with that. In your case, if you are using Azure or IBM services, definitely you can check out what they can do for you."

Creating an Amazon Web Services Account

  • Visit aws.amazon.com and click on "Create an AWS Account" if you don't have an account yet.

  • Provide email address and account name.

  • Verify your email address with a verification code.

  • Provide phone number and address information.

  • Select business or personal use.

  • Enter credit card information.

  • Confirm your email address.

  • Select a plan (e.g., basic free package).

  • Login as a root user to access Amazon Web Services.

"Creating an Amazon Web Services account is the first step. If you already have an account, you can skip this stage. Visit aws.amazon.com and click on 'Create an AWS Account.' Provide your email address and account name. Verify your email address and enter phone number and address information. Choose business or personal use and enter your credit card information. After that, confirm your email address. Select a plan and login as a root user to access Amazon Web Services."

Getting the necessary SDKs and DLLs

  • Download the Amazon Web Services SDK for .NET.

  • Download the Amazon Polly package for Unity.

  • Optionally, create a link.xml file for IL2CPP scripting backend.

  • Install the Microsoft PCL Async Interfaces DLL.

"To get the necessary SDKs and DLLs, download the Amazon Web Services SDK for .NET and the Amazon Polly package for Unity. Optionally, create a link.xml file for IL2CPP scripting backend. Also, install the Microsoft PCL Async Interfaces DLL."

"Normally, you can get any C# and Microsoft packages through a NuGet Package Manager. However, in the case of Unity, we needed to download the packages directly. But for the first time, we needed to download a ZIP file. In this case, we only need to download the package directly."

Conclusion

  • The chosen service for text-to-speech is Amazon Polly because of its good .NET support and existing access through Amazon Web Services.

  • Creating an Amazon Web Services account is the first step, followed by obtaining the necessary SDKs and DLLs for Unity.

"We have our keys, we have Amazon access. For you, it might take a little longer to create an account and get the confirmation from Amazon, but you can continue the video right after that. What we are going to do now is actually go and get the SDKs and the DLLs that we are going to use in Unity."

Extracting the Package

  • The downloaded package is a .nupkg file, which is actually a ZIP file.

  • To extract the contents, rename the file extension to .zip.

"You will see this download package, and when we actually go to the download location what you will see is that this is a n-u-pkg file nugget package. This actually is a ZIP file itself so to extract things from it I will just rename it to that zip, let's save it and voila I can see the content now."

Enabling File Extensions

  • In order to rename the file extension, you need to enable viewing file extensions in the file settings.

  • This will make the extension visible so that you can easily rename it.

"By the way, to be able to do that from the file settings, you will need to enable seeing the extensions so don't forget that, otherwise the extension won't be visible to you. In this case, I was able to change that."

Selecting the Required DLL Files

  • In the "lib" folder, navigate to the "netstandard2.01" folder.

  • Copy the DLL file you need and move it to the main folder.

  • In the AWS SDK folder, find the "AWS sdk.core" and "poly" DLL files, and move them to the main folder.

"I'm going back into it, I'm gonna go into lib folder, then I'm gonna pick net standard 2.1. I do not try to use 2.0 because it has some errors like it's an older version. This works flawlessly, so what we are going to need is the DLL file itself. So I'm gonna move it up into the main folder that was in. This is one of the DLLs I need. And I'm gonna move to the AWS SDK and here I'm gonna find AWS sdk.core and poly. These two DLLs are what I need. I'm just gonna scroll down, there are quite many things here like all the possible available SDKs of Amazon."

Creating the Plugins Folder

  • In Unity, create a folder called "Plugins".

  • Place the three required libraries in this folder.

"Let's go back into Unity. I'm gonna create a folder here called plugins, and I will place these three libraries right in here. So we are ready to script with Amazon SDK."

Creating the Script

  • Create a script called "TextToSpeech.cs" or any desired name.

"Let's create a script that we are going to use to run the SDK. I'm gonna call it 'TextToSpeech' and let's open our script."

Creating the Amazon Polly Client

  • Create a variable called "client" of type "AmazonPollyClient".

  • Instantiate the client by passing the AWS credentials and the region.

"So first thing we are going to do is create the client so we can actually connect to Amazon Web Services. Here we are going to use the secret key and the public key that we had. So for that, first, I'm gonna create a variable here, let's call it 'credentials', and this is going to be 'BasicAmazonWebServiceCredentials'. In this one, you will use your access key which is the public key and the secret key."

Creating the SynthesizeSpeechRequest

  • Create a variable called "request" of type "SynthesizeSpeechRequest".

  • Set the initial values for the request, including the text, engine, voice ID, and output format.

"The next thing is we need to create the synthesizing the speech, the request for synthesizing speech. I'm gonna create a variable called 'request', and this is going to be a type of 'SynthesizeSpeechRequest'. This is going to have some initial values in it, so when we check the UI in the Amazon website, there are certain options like which engine you want to use, which output format, voice ID, the name of the character you want to use, and the text itself."

Making the SynthesizeSpeechRequest

  • Create a response object.

  • Use the client's SynthesizeSpeechAsync method to make the request.

"Now I'm gonna make my request and wait for the response. So we create a response object and client.synthesizeSpeech is an asynchronous method, so I can actually update it right here."

"So we created our credentials, created our client with the credentials, created the request, made the request, and pretty much we have our audio right here."

Saving the file to the persistent data folder

  • It is recommended to save the file to the persistent data folder.

  • This can be done using the Application.persistentDataPath property.

"Um it might go into the root folder so maybe best is to put into persistent data folder so I'll just say application persistent data path so this is where our file will be written."

Creating a data buffer

  • A data buffer needs to be created to save each chunk of data read from the system.

  • The buffer can be a byte array of a specific size.

  • The buffer size can be determined based on the chunk size and the file format (in this case, for an audio file).

  • Larger chunks are generally better for audio files.

"And for that, I need to create a data buffer, which we are going to save every chunk that we are going to read from the system read from the Stream. Let's say a byte array, um bytes, and size can be, for example, four times one k or, you know, it's up to you to decide the uh, since this is audio file larger chunks are much better. Um so smaller chunks means, you know, like more iterations but uh you also don't want to leave some uh space if it's you know like too small so if it was smaller text I would have smaller buffer size."

Reading and writing data to the buffer

  • While there are bytes to read from the stream, read the data into the buffer using the Stream.Read method and write the buffer to the file stream.

  • Repeat this process until all bytes are read.

  • There is no need to manually close or dispose the file stream when using the using block.

"So while there is byte street we will keep writing this into the file stream. And once the using block is done, we do not need to do a close or dispose as far as, I know because at the end of using file stream will be flushed and we will be done with this."

Downloading the file using Unity Web Request

  • Use Unity Web Request to download the file from the specified path.

  • Create a new UnityWebRequestMultimedia object and use the GetAudioClip method to specify the file path and audio file type (in this case, MP3).

  • Start the web request and wait for it to complete before retrieving the downloaded audio clip.

"So I'm going to say using VAR www equals new UnityWebRequestMultimedia. GetAudioClip will be what I do. So I know the path, I will make a call to the same path because the audio is in here, and the type of the audio is MP3."

Playing the audio clip

  • Create an AudioSource component in Unity to play the downloaded audio clip.

  • Assign the downloaded audio clip to the AudioSource.clip property.

  • Call the AudioSource.Play method to start playing the audio.

"AudioSource.clip is the clip we just retrieved and audiosource.play so when we run the scene our audio will play."

Summary from youtubesummarized.com