Real-Time Transcription Overview
Real-time transcription resembles the standard callback mechanism with one major difference. Instead of POSTing the entire transcript to the callback server, the transcript of each utterance is POSTed as soon as it is ready. Utterance transcripts are HTTP POSTed to a client-side callback server. Utterances are transcribed based on two events:
Break(s) in speech
Max utterance length
The max utterance length setting can be as high as 80 seconds (15 seconds is typical), but this is a variable that will require tweaking per solution and use case. Note that setting max utterance too low will most likely degrade transcription accuracy; a lower setting reduces the the amount of context available to support recognition, which the ASR engine relies on.
Latency is measured from the time an utterance to be transcribed ends to the time that a transcription result is posted. Load impacts this latency:
Light load: 0.2x latency should be expected
Medium load: 1x latency should be expected
Heavy load: > 1x latency should be expected
Refer to the Real-Time Streaming Transcription with V‑Blaze section of the V‑Blaze API Guide for more information on how to use real-time transcription.