V-Blaze and V-Cloud Online Help

Real-Time Transcription Test Example

This example requires three terminal sessions. One to receive utterances in real time, one to send audio, and one for API requests. Multiple terminal sessions can be accomplished using terminal multiplexers such as Screen or tmux.

This example uses localhost, however, the callbacks and audio source can be pushed to other hosts.

Receive real-time utterance-by-utterance callbacks

Use the following command to run netcat in a loop to receive POSTs:

while true ; do /bin/echo -e 'HTTP/1.1 200 OK\r\n' | nc -l 5556; done
Send an audio stream via a socket with netcat

Use the pv command to simulate real-time audio source and decoding by limiting the audio rate.

pv -L 16000 mono_pcm_sample.wav | nc -l 5555


Double the audio rate to 32000 for stereo PCM

Or more generally:

cat mono_pcm_sample.wav | pv -L 16000 | nc -l 5555
cat /opt/voci/server/examples/sample7.wav | pv -L 16k | nc -l 5555
Send a request to the ASR server

The following ASR parameters below are a good starting point for common situations, but may require adjustments for specific environments or requirements.

curl -F realtime=true -F output=text \
     -F vadtype=level -F activitylevel=175 \
     -F uttminactivity=1000 -F uttmaxsilence=500 -F uttpadding=250 -F uttmaxtime=15 -F uttmaxgap=0 \
     -F datahdr=WAVE -F socket=localhost:5555 \
     -F utterance_callback=http://localhost:5556 \
     -X POST http://localhost:17171/transcribe


Both the utterance_callback and socket settings are interpreted from the point-of-view of the ASR host. The ASR must be able to resolve and connect to the hosts and ports specified.


Making API calls to V‑Blaze and specifying header tags such as samprate, encoding, and endian that do not match what is found in a WAV file's header information will cause errors during the transcription process, and the audio file will not be processed. If the audio file is a native WAV file, then there is no need to specify anything other than datahdr=WAVE in the API call. The only time you need to specify nchannels, sampwidth, samprate, encoding, and endian, is when the audio files are raw or headerless.

RAW or headerless audio

Oftentimes, real-time streaming audio will not include a RIFF header. When transcribing raw or headerless audio, the datahdr field is not used to define the file header; raw encoded audio is supported by explicitly providing the information normally provided by the header. This includes at a minimum the sample rate, sample width, and encoding. The byte endianness can also be specified, however the default value of LITTLE is usually correct. The following is an example:

curl -F utterance_callback=http://receiver:5556/utterance/index.asp \
     -F socket=sender:5555 \
     -F samprate=8000 \
     -F nchannels=2 \
     -F sampwidth=2 \
     -F encoding=SPCM \

Alternative implementation method: sending realtime audio in the POST

The ASR REST API enables submission of realtime audio directly in the request POST by sending chunks of audio data directly through the HTTP connection as they become available. Voci can provide the requests_mpstream.py reference code by request if this style of data flow is required for the deployment architecture.

An example of using requests_mpstream.py :

pv -qL 16000 sample1.wav | \
    python requests_mpstream.py http://localhost:17171/transcribe - realtime=true output=text \
        vadtype=level activitylevel=175 \
        uttminactivity=1000 uttmaxsilence=500 uttpadding=250 uttmaxtime=15 uttmaxgap=0 \
        datahdr=WAVE \