V-Blaze and V-Cloud Online Help

Speech to Text

The ASR engine uses language models tuned for telephony-based communications such as customer service call center interactions, voicemail, phone sales, and similar audio. The system caters to continuous, spontaneous, uncooperative speech. Speech of this type typically occurs during a phone call between an agent and a caller, or in a voicemail, where it is typical of callers to leave spontaneous messages.

Spontaneous, uncooperative speech is different from other telephony-based situations, for example a receptionist who is practiced in leaving messages (rehearsed speech), someone reading from a script (read speech), or someone interacting with an interactive voice response (IVR) system (prompted speech).

Characteristics of the processed audio such as the gender of the speaker, emotional values, sentiment associated with the input, and other metrics can be calculated for inclusion in structured output. Once speech has been transcribed into a specific domain of a specific language, the techniques that are available to fine tune the transcribed text are applied if they are requested and enabled. Once text has been processed and cleaned up, various types of analytics can be performed if they have been requested.

Features of our speech-to-text solution include:

In most cases, you do not need to change the API to support these features when processing your audio files.