V-Blaze and V-Cloud Online Help (May 2020)

The utterances Array

The top-level utterances element contains an array of speech segment information. Each utterances array consists of the following elements:

Table 1. Data contained in the utterances Array

Object

Type

Definition

emotion

value

Emotional intelligence consists of both acoustic and linguistic information. Events can be given the following values:

  • Positive

  • Mostly Positive

  • Neutral

  • Mostly Negative

  • Negative

confidence

value

A measure of how confident the speech recognition system is in its utterance transcription results

  • Range between 0 and 1

  • 1 is most confident

end

value

End time of the utterance in seconds

sentiment

value

Utterance-level linguistic sentiment value:

  • Positive

  • Mostly Positive

  • Neutral

  • Mostly Negative

  • Negative

  • Mixed (contains both Positive and Negative in the file)

gender

value

Gender prediction of the speaker

sentimentex

array

Contains sentiment information for each utterance

  • [0][0] = Positive phrase counts

  • [0][1] = Negative phrase counts in utterance

  • [1][*] consist of an array of sentiment segments where [0] = ‘+’ or ‘-‘ for Positive and Negative, and [1] is the position range of the phrase

  • [0] is beginning and [1] is end position

start

value

Start time of the utterance in seconds

donedate

value

Date and time the utterance transcription was completed by the speech-to-text engine

recvdate

value

Date and time the utterance was received by the speech-to-text engine

events

array

Contains information about individual words. Each element is a word object that contains the following values:

  • confidence: word level transcription confidence value between 0 and 1

  • end: end time of the word in seconds

  • start: start time in seconds

  • word: normalized word.

  • wordex: raw dictionary word. This value is often used to disambiguate different pronunciations that have the same spelling.

metadata

object

Speaker information of the utterance. Each object contains the following values:

  • channel: channel number

  • model: model that decoded the utterance

  • source: audio file name

  • uttid: utterance segment number



Each utterances element contains an events array, which provides information about each word in the utterance.

Figure 1. Utterances Array Element and Events Array
Utterances Array Element and Events Array