Transcribe Service was launched by Amazon in 2017 enabling developers to implement a speech-to-text feature to their applications.
Analyzing and data extraction from audio files is almost impossible for computers. To use such data in an application, speech must first be converted to text. Services performing speech recognition technologies have certainly existed before, but they were generally expensive and poorly adapted to various scenarios, such as low-quality phone audio in some contact centers.
Powered by deep learning technologies, Amazon Transcribe is a fully managed and continuously trained automatic speech recognition service that automatically generates time-stamped text transcripts from audio files. The service parses audio and video files stored in many common formats (WAV, MP3, MP4, AMR, Flac, etc.) and returns a detailed and accurate transcription with timestamps for each word, as well as appropriate capitalized words and punctuation. For most languages, numbers are transcribed into a word form, however for English and German languages Transcribe treats numbers differently depending on the context in which they're used.
Now Transcribe supports 37 languages.
Transcription methods can be divided into two main categories:
- Batch transcription: transcribing media files that have been uploaded into an Amazon S3 bucket;
- Streaming transcriptions: Transcribe media streams in real time.
Here are some of the features it provides:
- Single and multi language identification: identifying the dominant language spoken in your media file and creating a transcript. If speakers change language during a conversation, or if each participant speaks a different language, your transcription output correctly detects and transcribes each language;
- Transcribing multi-channel audio: combines transcriptions from multi channel audio into a single output file. It is possible to enable channel identification for both batch processing and real-time streaming;
- Speaker diarization: the partition of the text from different speakers, detecting each speaker in the provided audio file;
- Custom language models: designed to improve transcription accuracy for domain-specific speech. This includes any content that goes beyond the everyday type of conversations. For example, an audio recording of a report from a scientific conference will obviously contain special scientific terms that standard transcription is unlikely to be able to recognize. In this case, you can train a custom language model to recognize the specialized terms used in your discipline;
- Custom vocabularies: are used to improve transcription accuracy for a list of specific words. These are generally domain-specific terms, such as brand names and acronyms, proper nouns, and words that Amazon Transcribe isn't rendering correctly;
- Tagging: adding custom metadata to a resource in order to make it easier to identify, organize, and find in a search;
- Subtitles: can be used to create closed captions for your video and filter inappropriate content from your subtitles.
Transcribe offers indispensable features for call centers and support services. It helps to capture useful insights by transcribing customer calls in real time. Analyzing and categorizing calls by keywords, phrases and sentiment can help track negative situations, identify trends in customer issues or allocate calls to specific departments.
It is possible to measure the volume of speech. This metric helps to understand if the customer or employee is talking loudly, which is often an indication of being angry or upset. The quality of communication with the client can also be determined by setting the following metrics: interruptions, non-talk time, talk speed, talk time.
Besides call-centers, Transcribe Service can be useful in almost any field: education, law, e-commerce, and many others. For example, Amazon Comprehend Medical is a machine-learning-powered HIPAA-eligible service pre-trained to identify and extract health data from medical texts, such as prescriptions, procedures, or diagnoses.
It is difficult to imagine modern technologies without a service that can transform speech into text. And of course, Transcribe has analogues from other digital giants. However, it is worth noting that a large number of developers who have leveraged Amazon service, admit a much higher quality and accuracy compared to similar solutions provided by the current market.