V-Blaze and V-Cloud Online Help (May 2020)

Assisted Error Identification

Voci provides a Python 2.7 script named findMisDecodings* to assist with locating candidates for substitution.

The findMisDecodings.py script examines time-weighted relative confidence scores of Ngrams to identify short phrases and individual words with low average relative confidence across the working set. An Ngram in this context is a phrase of length “N” where “N” can be 1, 2, 3, or 4.

The script identifies words and phrases with lower-than-average confidence scores. Confidence scores are added to the JSON transcript by the ASR engine. These scores represent the speech recognizer's certainty that the ASR engine has transcribed the word correctly. The score ranges from 0 to 1. Words with a higher than average confidence score are more likely to have been correctly transcribed.

The results from the findMisDecodings script contain substitution candidates that can be verified by listening to the associated portions of audio. Good substitution candidates will have a higher frequency, will have a lower average confidence, and the word or phrase will likely appear out of context.

There are a few requirements that must be met before you can use findMisDecodings to identify substitution candidates. Install Python 2.7 and the Python Module "NLTK" (version 3.2.5 or later) on your system. Then, place the JSON transcripts in a working directory. Once those steps are complete, findMisDecodings.py can be used to find substitution candidates.

Run findMisDecodings.py on the command line (Unix/Linux Shell, PowerShell), indicating where the directory containing the JSON files is located. Parameters must be specified as well. Running findMisDecodings.py without parameters will cause this script to print Help documentation that simply describes proper usage.