Voice Activity Detection and Utterance Controls
The following parameters are most often used in real-time transcription scenarios using V‑Blaze.
Name | Values | Description |
---|---|---|
activitylevel |
default is 175 | Specifies the volume threshold for active versus inactive audio. This value should be high enough to screen out noise, but low enough to clearly trigger on speech. Range is 0-32768, correlating to the average magnitude of a signed 16-bit LPCM frame. |
uttmaxgap |
| Specifies the maximum gap in seconds that can occur between utterances before they are combined. During text processing, each utterance is buffered for a maximum of TipDuring real-time speech processing, |
default is 800 ms | Specifies the maximum amount of silence in milliseconds that can occur between speech sounds without terminating the current utterance. Once a silence occurs that exceeds Refer to uttmaxsilence for more information on this parameter. | |
default is 150 seconds | Specifies the maximum amount of time in seconds that is allotted for a spoken utterance. Normally an utterance is terminated by a sufficient duration of silence, but if no such period of silence is encountered prior to reaching | |
uttminactivity |
default is 500 ms | Specifies how much activity is needed (without |
uttpadding |
default is 300 ms | Specifies how much padding around the active area to treat as active. Typically the higher the |
vadtype | energy (default), level | The two types of Voice Activity Detection (VAD) available during transcription are The |