Language Support
The ASR engine uses machine learning components known as models to represent knowledge about speech. This knowledge is applied during transcription. The two types of models used are acoustic models and language models. The acoustic model converts audio into a stream of sound symbols specific to a language, such as English or Spanish. The language model is responsible for converting the stream of sound symbols into text.
Accents are incorporated into the process of developing acoustic models. For example, audio from calls originating in Southern, Northeast, Midwest and Western regions of the United States were used to train Voci 's North American English acoustic model (eng1), which enables this model to work robustly throughout the United States. For accents that diverge strongly from those found in the United States, different acoustic models are necessary for optimal accuracy. Voci supports United Kingdom, European, and Australian English in this way.
The following language packages have been developed and tested extensively for use with the ASR engine . The language packages included in the table use either the "call center" or "large vocabulary" specialty, both of which perform well for most use cases. The language packages in the following table provide a strong baseline capability that works well out of the box.
Additional language packages are available for more specific needs. These packages are highly specialized to a particular market or use case and should be used with caution. Refer to Custom Language Modeling for more information on custom language packages.
The following sections describe all available language packages in more detail.