V-Spark Online Help

/transcribe API Reference

V‑Spark uses the HTTP POST method to submit audio and optional metadata files for processing. Please refer to theV‑Spark Management Guide for more comprehensive information about supported audio formats, filename requirements, and metadata formatting details.

Uploading individual or multiple files

When using the /transcribe API to submit files for transcription, single audio files and JSON transcripts can be submitted individually. Files submitted individually will not be associated with each other.

Multiple files can be submitted in a single POST request, but they must be encapsulated into a single zip file. These zip files can contain both audio data and metadata. Audio files and metadata files submitted as parts of a zip file will remain associated with each other as parts of a single submission.

Note

By default, the maximum size of a file submitted using the /transcribe API is 250 MB. The maximum upload size can be changed using the transcribe_api_upload_limit system configuration option, but its value (specified in bytes) may not exceed 10 GB.

Note

Any metadata that you provide must be formatted as described in the Metadata Management section of the V‑Spark Management Guide.

Tip

V‑Spark's GUI enables you to submit individual files in various formats to a specific folder. Use the Settings menu's Folders command to display your folders, then click the Upload audio button to the right of a folder's name.

See the Audio Management section of the V‑Spark Management Guide for more information.

Audio Requirements

An audio file's format and other properties have a significant impact on the accuracy of ASR transcription, and the level accuracy for a given transcript affects analytics performance. The best format for audio submitted for transcription and analysis is a lossless G.711 WAV (PCM, uLaw, or aLaw). V‑Spark supports a wide variety of audio types because it converts audio before passing it to the ASR engine for transcription. The best way to check audio file compatibility is with the Audio Evaluator.

Although V‑Spark converts audio for transcription, that conversion cannot account for voice data lost due to suboptimal recording and encoding practices. Refer to the Improving Transcription Accuracy section of the V‑Blaze User Guide for more information about how an audio file's channels and other properties impact ASR transcription.

The number of channels in an audio file also has significant impact on transcription and analysis. Audio submitted for transcription and analysis must have one or two channels. The number of channels in source audio, along with how those channels are used, affects V‑Spark's ability to distinguish between speaker roles. In most cases, these roles are agent and client, and distinguishing between the two is critical for transcript analysis.

Transcription and analysis work best with two-channel (stereo) audio that has each speaker role on a separate channel. Audio with more than one speaker on the same channel may be diarized, a process that separates the audio into two channels and assigns each speaker to a different channel.

Important

V‑Spark does not support audio with more than 2 channels.

Filename Characters

The names of uploaded audio files and zip archives must adhere to the installation's filename requirements whether they are uploaded through the GUI, or with the API's /transcribe endpoint. When uploading a zip file, only the name of the zip file is validated against this expression; files inside the zip are not checked. This feature was first implemented with version 4.0.1-3 to help protect against remote code executions. By default, these characters are not permitted in uploaded filenames: #*<>:?/\|{}$!'`"=^

To disable filename validation, set the filename_validation system configuration setting to off. To define custom filename character requirements, specify a regular expression for the filename_validation_pattern system configuration setting.

Rejecting Duplicate Audio by Filename

Files are not required to have unique names at a system level, but as of V‑Spark 4.2.0-1, individual folders may be configured to reject files with duplicate filenames. In either case, filenames should be unique as a best practice. Consider adding the file's timestamp, call ID, or a UUID to create a unique filename. Duplicate filenames make some processing take longer.

Note

Independently of the duplicate rejection setting, if two files with identical names are submitted to the same folder at the exact same second, only one of those files will be processed.

When a folder has the deduplication setting enabled, that folder will reject file uploads in the following scenarios:

  • A file is uploaded with the same name as a previously uploaded file.

  • A zip file contains a file with the same name as a previously uploaded file.

  • A zip file contains two or more files with the same name.

Filename-based deduplication may fail if the first file is still in the Job Manager queue waiting to be processed.

The entire zip file is rejected when a duplicate file is detected inside the zip. Duplicate file rejection for zip files nested inside other zip files is not supported. When folder-level deduplication causes a file to be rejected, V‑Spark generates a WARNING-level message in server.log and the Activity Log.

A request submitted to the /transcribe API endpoint with an invalid filename parameter fails and returns HTTP error code 422.

Transcription options

All parameters that control transcription options are specified in the V‑Spark Folder definition. These include the language models used to decode each audio channel, number of speakers, number of audio channels (i.e. mono or stereo), etc. It is therefore unnecessary to provide these parameters when POSTing files to V‑Spark.

Authorization token

You can use either the root token for your V‑Spark installation or the token for the company that is associated with the organization and folder to which you are submitting your transcription request. See V‑Spark API Permission Requirements for information about locating these tokens and the rights that these tokens give you.

Example POST request using a zip file

When using the /transcribe API to submit zip files for transcription of the audio files that they contain, the POST must be encoded as a multipart/form-data request, with the zip file name provided in a file field and a V‑Spark authorization token provided in the token field.

The following is an example of calling the /transcribe API method using the cURL command-line utility:

curl -F token=0123456789abcde0123456789abcde01 \
       -F "file=@/path/to/audio_and_meta.zip;type=application/zip" \
       -X POST https://hostname/transcribe/org_shortname/folder_name

The cURL utility is freely available for operating systems including Linux, Windows, and macOS.

Note

Items shown as replaceable in the sample cURL command are example settings only and must be replaced with real values that are appropriate for your environment.

In the example command, note that org_shortname refers to the Short Name assigned to the target Organization, which can be found on the V‑Spark Settings page in the Organization section of V‑Spark. The folder refers to the folder for the organization into which you want to upload the audio that is contained in the zip file that you are uploading.

Figure 1. Location of the V‑Spark Organization Short Name
Location of the V‑Spark Organization Short Name


The cURL command exits after transmission of the zip file to the V‑Spark instance has completed.

Next steps

The POST returns a universally unique identifier (UUID) that identifies the transcription request. All transcripts produced as a result of the request will include a requestid field with its value set to this UUID. The requestid enables you to correlate individual transcripts with specific transcription requests.

Once the audio has been transcribed, the transcripts (along with optional metadata) are loaded into V‑Spark. Transcripts from any given request can be retrieved using the aforementioned UUID with the /request endpoint, or by using a callback server. Please refer to the V‑Spark Review and Analysis Guide for details regarding browsing, searching, and analyzing the calls and metadata within V‑Spark.