V-Blaze Online Help

Adjusting for Different Types of Input

Table 1. Adjusting for different types of input

Name

Availability

Type

Values

Description

datahdr

V‑Blaze only

WAVE (Default for files with .wav extension)

Set datahdr to WAVE when audio contains a RIFF header that specifies audio sampling rate, sampling width, and encoding. Filenames ending in the “.wav” extension typically possess such a header, although this is not guaranteed.

diarize

V‑Cloud, V‑Blaze

boolean

true, false (default), noise

Diarization is the process of recognizing distinct speakers on a single (i.e. mono) audio channel and segmenting transcribed speech into separate channels, which are identified in JSON output. Voci’s diarization capability is designed to do this for two speakers, typically a call agent engaged in a conversation with a client over the phone.

You should only set diarize to true under the following conditions:

  • You know that your audio only contains a single audio channel

  • You know that 2 people are talking on the channel

  • Segregation of 2 speakers in the transcripts is important for your use case

The noise setting is typically not needed. However, if you are experiencing excessive diarization errors due to interference from music or other non-speech sources, you can apply noise reduction by setting diarize=noise.

Diarization is a licensed optional feature.

Note

If diarize and any redaction options are used together, redaction accuracy is somewhat reduced. For maximum redaction accuracy, do not activate diarization when using any of the redaction options.

encoding

V‑Blaze only

SPCM, UPCM, ULAW, ALAW

Specifies the algorithm used to encode the audio. Encoding must be supplied when raw/headerless audio is being transcribed.

Refer to encoding for more information on this parameter.

endian

V‑Blaze only

boolean

LITTLE (default), BIG

Specifies the byte ordering of audio samples. In a BIG endian data word the most significant byte comes first, when reading from left to right. In a LITTLE endian data word, the least significant byte comes first. By convention, LITTLE endian (the default) is the most common.

This parameter is not required unless your audio uses BIG endian byte ordering.

nchannels

V‑Cloud, V‑Blaze

string

integer

Required when doing realtime decoding when there is no data header.

When audio is stereo, specify -F nchannels=2  for the raw version to work.

samprate

V‑Blaze only

string

integer

Specifies the sampling rate of the audio to be transcribed. Telephone audio is typically sampled at 8000 Hz. For best results, the sampling rate should be a multiple of 8000 (e.g., 8000, 16000, 24000, etc.). Values less than 8000 are not supported.

sampwidth

V‑Blaze only

string

integer

Specifies the size of each digitized audio sample in bytes. This parameter is only applicable if the encoding parameter is set to SPCM or UPCM.

This parameter is only applicable—but must be supplied—when raw or headerless audio is being transcribed and the encoding parameter is set to either SPCM or UPCM.

transcode

V‑Cloud, V‑Blaze

boolean

true, false (default)

Determines whether V‑Blaze should use its built-in decoders to try to convert incoming audio into a supported format, if necessary. This option cannot be used with the truncate option.

vadtype

V‑Blaze only

boolean

energy (default), level

The two types of Voice Activity Detection (VAD) available during transcription are energy and level. The energy setting instructs the engine to use the amount of energy in the audio signal to determine if speech might be present. This is the best setting to use when transcribing audio files (i.e. post-call, or "batch" transcription).

The level setting instructs the engine to use the simple amplitude level of the audio signal for VAD. This is the best setting to use when transcribing live audio streams (i.e. in-call, or real-time transcription) because it operates instantaneously, without the need for buffering.