JSON transcripts

All Voci products use the JSON file format to store transcript data derived from source audio. This data includes the text decoded from speech, along with metadata that describes audio attributes and the results of linguistic and emotional analysis performed by Voci products.

Note: When a data value is not defined (null), the data name will not appear as an element in JSON output. Therefore, not all of the elements listed in this section may be present in a specific JSON output file.

The data fields included in Voci JSON transcription output vary depending on the products, circumstances, and optional features that were used to generate the output. There are two categories of Voci JSON data:

  1. Core ASR data generated by the ASR engine under most circumstances and whenever text is decoded from speech. For example, fields like the top-level asr and model elements will always be included in JSON output because they refer to ASR engine and language model attributes. Similarly, the utterances array is included if audio was successfully transcribed because it is generated any time speech is decoded from audio.

  2. Conditional and parameter data generated only under certain conditions or when certain transcription parameters are specified. For example, the top-level chaninfo object is included only for stereo or diarized audio, and the top-level emotion field is included only when the transcription request includes the stream tag emotion = true .

    Under some conditions, fields with identical names appear at different levels of the JSON data hierarchy. For example, the field agentscore is included at the top level only when processing undiarized mono audio with the stream tag agentid = true . When processing stereo audio with agentid = true , the agentscore field instead appears in the chaninfo objects for each channel of audio.

Tip: Check the warning tag in the JSON output to see if there were any issues with the transcription.