V-Spark Online Help

Audio Requirements

An audio file's format and other properties have a significant impact on the accuracy of ASR transcription, and the level accuracy for a given transcript affects analytics performance. The best format for audio submitted for transcription and analysis is a lossless G.711 WAV (PCM, uLaw, or aLaw). V‑Spark supports a wide variety of audio types because it converts audio before passing it to the ASR engine for transcription. The best way to check audio file compatibility is with the Audio Evaluator.

Although V‑Spark converts audio for transcription, that conversion cannot account for voice data lost due to suboptimal recording and encoding practices. Refer to the Improving Transcription Accuracy section of the V‑Blaze User Guide for more information about how an audio file's channels and other properties impact ASR transcription.

The number of channels in an audio file also has significant impact on transcription and analysis. Audio submitted for transcription and analysis must have one or two channels. The number of channels in source audio, along with how those channels are used, affects V‑Spark's ability to distinguish between speaker roles. In most cases, these roles are agent and client, and distinguishing between the two is critical for transcript analysis.

Transcription and analysis work best with two-channel (stereo) audio that has each speaker role on a separate channel. Audio with more than one speaker on the same channel may be diarized, a process that separates the audio into two channels and assigns each speaker to a different channel.

Important

V‑Spark does not support audio with more than 2 channels.

Filename Characters

The names of uploaded audio files and zip archives must adhere to the installation's filename requirements whether they are uploaded through the GUI, or with the API's /transcribe endpoint. When uploading a zip file, only the name of the zip file is validated against this expression; files inside the zip are not checked. This feature was first implemented with version 4.0.1-3 to help protect against remote code executions. By default, these characters are not permitted in uploaded filenames: #*<>:?/\|{}$!'`"=^

To disable filename validation, set the filename_validation system configuration setting to off. To define custom filename character requirements, specify a regular expression for the filename_validation_pattern system configuration setting.

Rejecting Duplicate Audio by Filename

Files are not required to have unique names at a system level, but as of V‑Spark 4.2.0-1, individual folders may be configured to reject files with duplicate filenames. In either case, filenames should be unique as a best practice. Consider adding the file's timestamp, call ID, or a UUID to create a unique filename. Duplicate filenames make some processing take longer.

Note

Independently of the duplicate rejection setting, if two files with identical names are submitted to the same folder at the exact same second, only one of those files will be processed.

When a folder has the deduplication setting enabled, that folder will reject file uploads in the following scenarios:

  • A file is uploaded with the same name as a previously uploaded file.

  • A zip file contains a file with the same name as a previously uploaded file.

  • A zip file contains two or more files with the same name.

Filename-based deduplication may fail if the first file is still in the Job Manager queue waiting to be processed.

The entire zip file is rejected when a duplicate file is detected inside the zip. Duplicate file rejection for zip files nested inside other zip files is not supported. When folder-level deduplication causes a file to be rejected, V‑Spark generates a WARNING-level message in server.log and the Activity Log.

A request submitted to the /transcribe API endpoint with an invalid filename parameter fails and returns HTTP error code 422.