V‑Blaze API docs

Integrate your V‑Blaze ASR engine with other components and services

The V‑Blaze REST API enables client-side automation of audio file submission and the receipt of completed transcripts. The V‑Blaze REST API enables both batch (file-based) and real-time (stream-based) operation. This API also supports several types of queries, enabling you to programmatically inspect your V‑Blaze instances for available options and status.

Note: All audio file examples use the WAV audio file format, identified by the .wav filename extension. MP3 files are not supported by V‑Blaze.

The V‑Blaze REST API requires that audio files be WAVE (LPCM) or RAW (headerless LPCM) formats. It does not support MP3. If necessary, transcode mp3 files into wav files using one of the following methods:

  1. Use the lame tool, a mp3/wav encoder/decoder:

    lame --decode yourfile.mp3 yourfile.wav

  2. Use the ffmpeg tool:

    ffmpeg -i infile.mp3 outfile.wav

Required components

This version of the V‑Blaze documentation discusses the following versions of the components that make up the V‑Blaze API:

  • voci-webapi-3.1.0-0

  • voci-server-server-7.4.2-1

Operation

V‑Blaze can ingest and process audio using two distinct methods:

  1. As complete files that are passed to the engine and then processed and transcribed as quickly as possible

  2. As an entry point for an audio stream that is read, processed, and transcribed in real-time

V‑Blaze is able to process data in either of two modes: batch or real-time. How the V‑Blaze ASR engine runs depends on a combination of default values and configuration values that are supplied at runtime. For example, the language models used when analyzing spoken input for content are usually specific to how an instance of the engine is being used, and are therefore supplied as a command-line option or configuration file value. In cases such as determining the input language, options can be identified and set at runtime based on some characteristics of the input to the ASR engine.

Figure 1. Overview of V‑Blaze operation
overview of V-Blaze operation including data in, decode worker processes, and API request

All API requests share a common acoustic and language processing (decoder) pool. If you have more request streams than decoders, the streams may block and not pull more audio from the connection/socket until it can be processed by an available decoder. With the V-Blaze REST API, all audio is processed as a stream rather than a single chunk or file that is POSTed all at once. The server pulls data from the connection as it can be processed by available decoders.