V-Blaze and V-Cloud Online Help

Adjusting for Different Types of Input

Table 1. Adjusting for different types of input





WAVE (Default for files with .wav extension)

Set datahdr to WAVE when audio contains a RIFF header that specifies audio sampling rate, sampling width, and encoding. Filenames ending in the “.wav” extension typically possess such a header, although this is not guaranteed.


false (default), true, noise

Diarization is the process of recognizing distinct speakers on a single (i.e. mono) audio channel and segmenting transcribed speech into separate channels, which are identified in JSON output. Voci’s diarization capability is designed to do this for two speakers, typically a call agent engaged in a conversation with a client over the phone.

You should only set diarize to true under the following conditions:

  • You know that your audio only contains a single audio channel

  • You know that 2 people are talking on the channel

  • Segregation of 2 speakers in the transcripts is important for your use case

The noise setting is typically not needed. However, if you are experiencing excessive diarization errors due to interference from music or other non-speech sources, you can apply noise reduction by setting diarize=noise.

Diarization is a licensed optional feature.


If diarize and any redaction options are used together, redaction accuracy is somewhat reduced. For maximum redaction accuracy, do not activate diarization when using any of the redaction options.



Specifies the algorithm used to encode the audio. Encoding must be supplied when raw/headerless audio is being transcribed.

Refer to encoding for more information on this parameter.


LITTLE (default), BIG

Specifies the byte ordering of audio samples. In a BIG endian data word the most significant byte comes first, when reading from left to right. In a LITTLE endian data word, the least significant byte comes first. By convention, LITTLE endian (the default) is the most common.

This parameter is not required unless your audio uses BIG endian byte ordering.



Required when doing realtime decoding when there is no data header.



Specifies the sampling rate of the audio to be transcribed. Telephone audio is typically sampled at 8000 Hz. For best results, the sampling rate should be a multiple of 8000 (e.g., 8000, 16000, 24000, etc.). Values less than 8000 are not supported.



Specifies the size of each digitized audio sample in bytes. This parameter is only applicable if the encoding parameter is set to SPCM or UPCM.

This parameter is only applicable—but must be supplied—when raw or headerless audio is being transcribed and the encoding parameter is set to either SPCM or UPCM.