V-Blaze and V-Cloud Online Help

HTTP Results Streaming

HTTP results streaming in the V‑Blaze REST API allows for a simpler, less powerful, alternative to WebSockets for receiving utterance results in real time. When combined with chunked transfer uploads, a bidirectional streaming interface over HTTP can be realized. This option is not supported if scrubaudio or zip output is requested.

There are several different flows that allow for HTTP result streaming:

Transcription Result Streaming

The transcription result streaming flow sends utterances results back to the user in a line-delimited JSON format. Each utterance is delimited by a '\r\n' (CRLF). After all utterances are streaming, two CRLFs are sent followed by the complete transcription. The format in which the utterances and complete transcription are sent in can be controlled using the utterance_fmt and output tags.

This flow is enabled by default when realtime=true and no utterance_callback is provided. It can always be disabled by specifying an outstream=false tag.

If outstream=true is specified without realtime=true, utterances will be streamed back in the format described above; however, this stream will not occur in realtime.

Audio Result Streaming

Scrubbed audio may be streamed over HTTP if scrubaudio=true, notext=true, and outstream=true are all specified. The resulting stream will contain uncompressed scrubbed audio in WAV format.

Note

Scrubbed audio is always streamed in real-time regardless of the value of the realtime tag.

Transcription Result Streaming with Redacted Audio

If both scrubaudio and outstream are true but notext is not specified, the utterance transcriptions will be streamed back in the format described above; however, instead of being followed by just the complete transcription, a ZIP file containing both the complete transcription and the redacted audio will be sent.

Note that this flow can not stream redacted audio in realtime. The final ZIP file is only sent after all audio data has been processed. WebSockets must be used if both real-time utterance and redacted audio streaming is necessary.