V-Blaze and V-Cloud Online Help

The textinfo Object

The textinfo object is included in a JSON transcript by default when any text is decoded from an audio file. To exclude the textinfo object, specify the stream tag textinfo = false when submitting audio for transcription. This element was first introduced with V‑Blaze version 7.3.

The textinfo object includes the following elements:

Table 1. Elements in the textinfo object

Element

Type

Description

turns

number

The number of distinct speaker turns detected in the audio. Calculated for stereo or diarized mono audio only.

wordtime

array

An array with the following number values:

  • The total audio time in seconds during which words were detected. Calculated by adding the durations of each utterance with words.

  • The percentage of total audio time during which words were detected.

overtalk

object

Metrics for overtalk throughout the audio file. Calculated for multi-channel audio only.

count

array

Contains the following number elements:

  • The number of overtalk occurrences.

  • The percentage of total speaker turns on which overtalk occurred.

avgtime

number

The average duration of all overtalk occurrences.

time

array

Contains the following number elements:

  • The total audio time in seconds with overtalk. Calculated by adding the durations of each utterance with overtalk.

  • The percentage of total audio time during which overtalk occurred.

words

number

The total of number of words spoken in the transcribed audio file.

silence

array

Contains the following number elements:

  • The total amount of audio in seconds with no sound.

  • The percentage of total audio time with no sound.

tags

object

Generated when the request is submitted with the stream tag luhn = true and at least one number that passed the Luhn check was detected in transcribed audio.

Contains one key-value pair with the field name luhn and an integer value indicating the quantity of numbers that passed the Luhn check, as in the following example:

...,
        "tags": { 
          "luhn": 4
        }
...


The following JSON example shows a textinfo object generated from stereo audio:

"textinfo": {
  "turns": 229,
  "wordtime": [
    945.62, 
    0.702
  ], 
  "overtalk": {
    "count": [
      92, 
      0.402
    ], 
    "avgtime": 1.19, 
    "time": [
      109.44, 
      0.116
    ]
  }, 
  "words": 3652, 
  "silence": [
    401.64, 
    0.298
  ]
}