V-Blaze and V-Cloud Online Help

Acoustic and Language Models

The ASR engine uses machine learning components known as models to represent knowledge about speech. This knowledge is applied during transcription. Two types of models used are known as acoustic models and language models. The acoustic model converts audio into a stream of sound symbols specific to a language, such as English or Spanish. The language model is responsible for converting the stream of sound symbols into text.

General-purpose language models are typically trained to understand domains like banking, health insurance, telecommunications, and voicemail. They provide a strong baseline capability that works well out of the box.

Accents are incorporated into the process of developing Voci's acoustic models. For example, audio from calls originating in Southern, Northeast, Midwest and Western regions of the United States were used to train Voci 's North American English models. This enables our North American English models to work robustly throughout the USA. For accents that diverge strongly from those found in the USA, Voci adds support by creating new acoustic models as necessary. For example, Voci supports both UK and Australian English in this way.

The following language models are available:

In addition to these language models, Voci also works with customers to develop optimal solutions within their budgetary constraints.