### Top-level Elements

The following table describes the top-level elements included in a JSON transcript.

### Note

Metrics such as clarity, silence, and overtalk are only shown in V‑Spark as they are calculated off of data in the V‑Blaze JSON output file.

Refer to V‑Blaze Transcription Parameters for more information on the stream tags used to generate the elements that appear in these sections.

Table 1. Top-level Elements

Element

Availability

Type

Definition

emotion

All

value

Emotional intelligence consists of both acoustic and linguistic information. Events can be given the following values:

• Positive

• Mostly Positive

• Neutral

• Mostly Negative

• Negative

Emotion must be the same for all utterances to be included at the top level. Additional emotion scoring is available in The utterances Array

asr

V‑Blaze version 6.1+

value

Version number of the automatic speech recognition server being used.

confidence

All

value

A measure of how confident the speech recognition system is in its transcription results. Results range between 0 and 1 with 1 being the most confident.

rawemotion

All

value

Acoustic emotion values. Possible values in version 7.1+ include:

• ANGRY

• NEUTRAL

• HAPPY

Acoustic emotion values prior to version 7.1 include:

• NONANGRY

• ANGRY

donedate

All

value

Date and time the file transcription was completed by the speech-to-text engine, meaning the last utterance finished.

utterances

All

array

Each audio file is broken up into segments of speech called utterances. The utterances array contains the word transcripts and corresponding metadata organized by utterances.

All

value

Identification information for the license used.

audiosecs

V‑Blaze version 6.1+

value

Duration of audio, in seconds, in the stream.

As of V‑Blaze 7.2, this element will not appear in the JSON output if there was a problem processing audio.

started

V‑Blaze version 6.1+

value

Date and time the stream started. This is most useful for measuring real-time transcription.

streamtags

V‑Blaze version 6.1+

A list of the parameters or other values specified by the user. This is useful for debugging and verification. It is also useful for tagging the output with user-level metadata (for example, tags that have meaning to the user for filtering or association). For example:

   "streamtags": {
"emotion": "xa",
"lid": true,
"subst_rules": "<17 chars>",
"gender": true,
"rawemotion": "xa",
"lidutt": true,
"substinfo": true,
"lidthreshold": 1.0,
"subst": true,
"scrubtext": true,
"datahdr": "WAVE",
"nsubs": "true"
}

nchannels

All

value

Number of channels in the audio file unless diarization is set to true, in which a single (1) channel file is broken up into 2 based on speaker separation

As of V‑Blaze 7.2, this element will not appear in the JSON output if there was a problem processing audio.

lidinfo

V‑Blaze version 5.6+

array

The lidinfo section is a global, top-level dictionary that contains the following fields:

• lang — the three-letter language code specifying the language that was identified for the stream

• speech — the number of seconds of automatically detected speech that were used to determine the language used in the stream

• langfinal - (V‑Blaze7.1+) Added when the language specified in LID is below threshold and not the default language.

• conf — the confidence score of the language identification decision

For example:

   "lidinfo": {
"lang": "spa",
"speech": 1.35,
"langfinal": "eng",
"conf": 0.81
}

langinfo

V‑Blaze version 7.1+

string

Breakdown of language information that is added when there was more than one language detected. The dictionary contains several fields:

• utts - the number of utterances spoken for the language identified

• speech — the number of seconds of automatically detected speech that were used to determine the language used in the stream

• conf — the confidence score of the language identification decision

• time - the number of seconds that the language was identified for the whole stream

For example:

 "langinfo": {
"spa": {
"utts": 1,
"speech": 17.46,
"conf": 1.0,
"time": 21.56
},
"eng": {
"utts": 1,
"speech": 1.35,
"conf": 0.81,
"time": 0.93
}

ended

V‑Blaze version 6.1+

value

Date and time the stream ended. This is most useful for measuring real-time transcription.

As of V‑Blaze 7.2, this element will not appear in the JSON output if there was a problem processing audio.

recvtz

All

array

An array containing two values:

• time zone abbreviation of the time zone in which the ASR engine is running

• offset in seconds from UTC for the time on the ASR engine

scrubbed

All

value

If true then audio is purified so numbers are all redacted. If false, the data name does not appear in the JSON output.

sentiment

All

value

Linguistic sentiment value:

• Positive

• Mostly Positive

• Neutral

• Mostly Negative

• Negative

• Mixed (contains both Positive and Negative in the file)

sentiment_scores

All

array

Array of length 2. [0]=Positive phrase counts and [1]=Negative phrase counts in the file

source

All

value

The audio file name.

gender

All

value

The gender identified for the audio.

model

All

string containing model name if one model was specified;

array of model names if multiple models were specified

Language model(s) specified for transcription. For example:

"model": "eng1:callcenter"

As of V‑Blaze 7.2, this element will not appear in the JSON output if there was a problem processing audio.

recvdate

All

value

Date and time the audio file was received by the ASR engine and placed in queue

requestid

All

value

The unique identifier for the request.

nsubs

V‑Blaze version 7.1+

value

The number of substitutions applied. This tag will not appear if no substitutions were applied.

This value does not include numtrans substitutions.

substinfo

V‑Blaze version 7.1+

string

Detail for substitutions that is included when substinfo=true.

• nsubs - (V‑Blaze 7.1+) The number of substitutions applied, including numtrans substitutions.

For example:

 "substinfo": {
"counts": [
[
"subst_rules",
1,
{
"persona => /Persona/": 1
}
]
],
"nsubs": 1
}

warning

V‑Blaze version 5.6.0-3+

string

This field describes a problem or issue that was encountered during transcription. A common example is substitutions errors.