Speech to Text
The ASR engine uses language models tuned for telephony-based communications such as customer service call center interactions, voicemail, phone sales, and similar audio. The system caters to continuous, spontaneous, uncooperative speech. Speech of this type typically occurs during a phone call between an agent and a caller, or in a voicemail, where it is typical of callers to leave spontaneous messages.
Spontaneous, uncooperative speech is different from other telephony-based situations, for example a receptionist who is practiced in leaving messages (rehearsed speech), someone reading from a script (read speech), or someone interacting with an interactive voice response (IVR) system (prompted speech).
Features of our speech-to-text solution include:
Automatic punctuation and number conversion to improve analytics
Confidence scoring to support the automatic removal of noisy information
Emotional Intelligence scoring to enable users to understand the tone and sentiment of their customers
Gender identification to enable demographic segmentation of analytic results
Redaction capability to remove sensitive information from recorded audio and associated textual output
In most cases, you do not need to change the API to support these features when processing your audio files.