V-Blaze and V-Cloud Online Help

V‑Blaze version 7.3.0-1 (December 2021)

New Features and Improvements

  1. Made several improvements and bug fixes to diarization, including:

    1. Changed the way diarization is processed, which allows for the use of music detection.

    2. Added a new clusterer that significantly improves accuracy without sacrificing performance.

    3. Added diarization scoring to the JSON output, as described in Top-level Elements. Refer to Adjusting for Different Types of Input for more information on how to use the diarization parameter.

    4. Fixed a bug that caused loss of speaker state every 5 minutes.

  2. Added music detection that uses an acoustic-based classification model. When music=true, each utterance is scored from -1 to 1 specifying the probability that music or other high energy non-speech events were detected. Utterances classified as music are skipped and not decoded. Music detection can be tuned using various parameters as described in Music .

  3. Added agent identification to both single and multi-channel transcription requests. Setting agentid=true identifies which speaker is the agent by a score of -1 to 1 with positive results indicating the agent and negative results indicating the client. Refer to Output Options for details on how to enable agentid. Refer to Top-level Elements for information on how to interpret agentscore in the transcription results.

  4. Improvements to emotion classification, including:

    1. Refined the way a call is split for trending analysis by comparing the first 90% with the last 10% of the call rather than splitting the call 50% to 50%.

    2. Changed top-level emotion classification from Mostly Positive, Positive, Neutral, Negative, or Mostly Negative to Positive, Negative, Neutral, Improving, Worsening.

    3. Added top-level and per-channel trends to the JSON output as described in Top-level Elements.

    4. Fixed an issue where utterance sentiment was overwritten and not reported.

  5. Added additional text and channel information to JSON output. Text-based metrics present top-level and per-channel information that includes:

    1. Speaker turns count, total of number of words spoken in the transcribed audio file, and the amount of silence detected. Refer to the The textinfo Object for more information.

    2. Updated overtalk calculation based on overlap of words. Refer to the The textinfo Object for more information.

    3. The chaninfo field appears only for stereo or diarized audio. It contains one object for each audio channel. Each channel object may contain textinfo, musicinfo, agentscore, and emotion depending on audio attributes and the stream tags specified with the request. Refer to chaninfo in the Top-level Elements topic for more information.

  6. Improvements to number translation, including:

    1. Added concatenation of double/triple replacements.

    2. Added compass directions to address logic.

    3. Added support for more time patterns.

    Refer to numtrans for more information on these changes.

  7. Multiple improvements to the text processing modules, including:

    1. No longer allow negative timespans from out-of-order backrefs.

    2. Multiple word replacement backrefs with attached text now result in duplicated text attached to the associated word.

    3. Fixes to capitalization logic to handle Unicode characters correctly.