ASR capabilities and features

Voci's automatic speech recognition (ASR) engine powers Voci's accurate and scalable Speech-to-Text (STT) solutions. Whether your call volume is measured in hundreds or millions of hours per month, the Voci ASR engine enables you to automatically generate high-quality transcripts from 100% of your speech audio assets.

How does Voci's ASR engine work?

Voci uses deep neural networks and deep belief networks in a proprietary configuration to convert speech to intelligent data. Voci speech recognition uses a combination of assisted and unassisted machine learning and is based on Large Vocabulary Continuous Speech Recognition (LVCSR) technology. LVCSR recognizes phonemes like a phonetic system, then applies a dictionary or language model to produce a full transcript. The accuracy is much higher than just the single word lookup of a phonetic approach, and transcript produced is much easier and faster for contact centers to search and use.

The ASR engine uses language models tuned for telephony-based communications such as customer service call center interactions, voicemail, phone sales, and similar audio. The system caters to continuous, spontaneous, uncooperative speech. Speech of this type typically occurs during a phone call between an agent and a caller, or in a voicemail, where it is typical of callers to leave spontaneous messages.

Spontaneous, uncooperative speech is different from other telephony-based situations, for example a receptionist who is practiced in leaving messages (rehearsed speech), someone reading from a script (read speech), or someone interacting with an interactive voice response (IVR) system (prompted speech).

Learn more about the features offered with Voci's ASR engine: