V-Blaze and V-Cloud Online Help


Values: integer


Specifies the maximum amount of time in seconds that is allotted for a spoken utterance. Normally an utterance is terminated by a sufficient duration of silence, but if no such period of silence is encountered prior to reaching uttmaxtime, the utterance is terminated forcibly.

The default value for uttmaxtime is 150 seconds. Human utterances are typically 5-20 seconds long. The uttmaxtime setting rarely requires modification. Examples of use cases that can benefit from adjusting this parameter include transcribing monologues or speeches with unusually long unbroken utterances, and real-time deployments with aggressive turn-around time requirements.

In most cases, shortening the value of the uttmaxtime tag to be less than 20 seconds will compromise accuracy, getting worse as uttmaxtime is reduced towards its minimum setting of 1 second.


When reducing the value of the uttmaxtime tag, accuracy is reduced when the ASR engine is forced to terminate an utterance at the uttmaxtime boundary. Such "cuts" take place while a word is being spoken. This means that a portion of one word will be in the first utterance, while the remainder of the word is located in the second. With few exceptions, word fragments do not sound like the original word, resulting in erroneous transcription. In addition, shorter utterances also contain less context, further reducing achievable accuracy.