Welcome to part 1 of the tutorial series on AWS Audio Analysis. Earlier we discussed the architecture diagram. In this tutorial, I’m going to quickly cover a few points on Amazon Transcribe service. Ideally, AWS Transcribe service allows a user to convert audio or speech to text. Hence, Speech Recognition.

Amazon Transcribe is an ASR (Automatic Speech Recognition) service which can be used across the application like Speaker Diarization/Speaker Identification, Video subtitle generation or transcription of customer care conversation and there are many use cases. You can even implement voice analytics and that is something I am going to take you through this tutorial series.

Transcribe is exposed by the API and you can leverage the power of API using boto3 package in Python. It also allows you to add custom vocabulary and it helps Amazon Transcribe to accurately recognize words and phrases that are specific to the application. Supports the number of file formats which are mp3, mp4, wav, and flac. It supports many languages and the Transcribe team is adding more and more. Real-time transcription is one of the great features of Transcribe.

One of the problems I faced with Amazon Transcribe is while working on a Speaker Diarization application. It works well with the short audio but starts messing up as the length of audio increases. But it keeps getting better with time. Have you faced any problem with Transcribe?

Here is the video tutorial for reference.


In the next tutorial, we will create the necessary resources with respect to the architecture diagram. While I post the new update on AWS Audio Analysis, refer my YouTube channel for more tutorials. Keep sharing and stay tuned for more. Follow me on Twitter