V-Blaze and V-Cloud Online Help

V‑Blaze version 7.1 (October 2020)

V‑Blaze version 7.1 is a major release with numerous improvements and bug fixes.

  1. Significant performance and functionality improvements have been made to the V‑Blaze REST API. Refer to V‑Blaze REST API version 2.1.0 (October 2020) for more information on these changes.

  2. Detection of acoustic speaker emotion has been improved and enhanced using a new emotion scoring subsystem. This optional addition can be enabled with emotion=xa. The full set of possible rawemotion values are now HAPPY, NEUTRAL, and ANGRY. The new HAPPY classification has no effect on standard or combined emotion unless emotion=xa is set.

    For more information on these parameters, refer to Emotion, Sentiment, and Gender.

  3. Language identification (LID) has been enhanced with new option parameters, JSON output elements, and other functionality improvements.

    1. Added new optional parameters for lid to delay LID start and to expand its scope to every utterance.

      Table 1. New optional parameters for lid

      Name

      Values

      Description

      lidoffset=N

      integer

      Delay start of LID until specified (N) seconds into audio. If there is not enough audio left after offset, this will process preceding utterances in reverse.

      lidutt

      true, false

      Run LID on every utterance. The default is only once per stream or audio channel. This option is only available with V‑Blaze 7.1+.

      Note

      This option has a significant performance impact and should only be used when necessary.



    2. Added a new lid parameter value for limiting utterance metadata.

      lid=language_model:notext - use this to not decode when a language is detected, just that a language was detected. If this option is specified, the JSON will not contain a model metadata element in each utterance.

    3. Improved logic for LID decisions with low scores.

      When LID scoring is below the decision threshold, the ASR engine will transcribe the audio with the language model specified by the model tag (or the default model for the ASR configuration if model is not explicitly provided). The results are indicated by a lidinfo.langfinal or lidinfo["n"].langfinal element in the JSON output.

    4. Improved startup handling if dependencies are missing.

    5. Made additions to JSON output.

      • lidinfo - now provided at utterance-level when lidutt=true

      • langinfo - breakdown of language information that is added when there was more than one language detected.

      • langfinal - added when the language specified in LID is below threshold and not the default language.

    For more information on LID and using these parameters, refer to Receiving Language Identification Information.

  4. Added new debugging parameters and JSON elements to assist with improved warnings and logging when using substitutions.

    1. New debugging parameters

      Warning

      These parameters are intended for debugging purposes only and should not be used in production.

      Table 2. New substitution debugging parameters

      Name

      Values

      Description

      subst

      true, false (default), none

      The subst parameter can be used to enable or disable automatic system- and model-level substitutions.

      subst=true

      Enables system- and model-level substitutions

      subst=false

      Disables system-level substitutions; model-level substitutions still apply

      subst=none

      Disables both system- and model-level substitutions

      substinfo

      true, false (default)

      Provides substitution details in JSON transcripts.

      Set substinfo to true to include a top-level JSON object that indicates the applied substitution rules and a number count for each rule.

      In addition to the top-level JSON object, substinfo includes another JSON object in the metadata that details each substitution's location, the substitution rule applied, and the substitution rule source.



      For more details on these and other parameters, refer to the Substitutions section of the V-Blaze REST API.V-Blaze REST API

    2. Added a new JSON output element: nsubs shows a count of substitutions applied at both top-level and utterance levels. When substinfo=true, nsubs will also include numtrans counts within the substinfo array. Top-level nsubs does not include numtrans counts. The nsubs element will not appear if no substitutions were applied.

    For more information, refer to the JSON Output Reference.

  5. Hinting is now supported for eng1 version 7 models. Hinting for version 5 models is no longer supported.

    For more information on hinting support, refer to the English page of the Language Models Reference.

  6. Realtime operation now defaults to off (realtime=false). Requests must specify the realtime=true option to get per-utterance callbacks during processing.

    For more information, refer to Voice Activity Detection Controls.

  7. Made enhancements to ASR licensing selection logic. The ASR engine now more efficiently selects usage licenses that meet minimum requirements for a stream. These enhancements also increased the efficiency of model verification and validation processes.

  8. Made minor enhancements and fixes to speech-to-text output processing (textproc), including:

    1. The system now preserves timestamps on backrefs in substitutions instead of interpolating.

    2. Fixed inadvertent uppercase of English cased backrefs (for example, /\1/).

    3. Eliminated unexpected behavior of pattern{min,max} when min=0.

    4. Improved negative frame/word warning.

  9. Changed the maximum value for uttmaxtime to 150 seconds.

  10. Made minor improvements to Spanish time formatting.

  11. Corrected the scope of emotion scoring to always score individual utterances.

  12. Eliminated rare edge case decode failures.

  13. Setting punctuate=false now generates English output in lowercase.

ASR v7.1+ Fixed Issue

  1. Minor modification of vociserver behavior to allow vociwebapi to recreate required ephemeral ramfs directories after unexpected vociserver service restarts. Previously, rare service interruptions could occur when results were returned as zip files, such as when scrubbing audio. (voci-server-server-sw-7.1.1-3, released November 2020)