V‑Blaze version 7.1 (October 2020)
V‑Blaze version 7.1 is a major release with numerous improvements and bug fixes.
Significant performance and functionality improvements have been made to the V‑Blaze REST API. Refer to V‑Blaze REST API version 2.1.0 (October 2020) for more information on these changes.
Detection of acoustic speaker emotion has been improved and enhanced using a new emotion scoring subsystem. This optional addition can be enabled with
emotion=xa
. The full set of possiblerawemotion
values are now HAPPY, NEUTRAL, and ANGRY. The new HAPPY classification has no effect on standard or combined emotion unlessemotion=xa
is set.For more information on these parameters, refer to Emotion, Sentiment, and Gender.
Language identification (LID) has been enhanced with new option parameters, JSON output elements, and other functionality improvements.
Added new optional parameters for
lid
to delay LID start and to expand its scope to every utterance.Table 1. New optional parameters forlid
Name
Values
Description
lidoffset=
N
integer
Delay start of LID until specified (N) seconds into audio. If there is not enough audio left after offset, this will process preceding utterances in reverse.
lidutt
true, false
Run LID on every utterance. The default is only once per stream or audio channel. This option is only available with V‑Blaze 7.1+.
Note
This option has a significant performance impact and should only be used when necessary.
Added a new
lid
parameter value for limiting utterance metadata.lid=
- use this to not decode when a language is detected, just that a language was detected. If this option is specified, the JSON will not contain a model metadata element in each utterance. Ensure that the language is specified but the domain is not included. For example: lid=spa:notext is valid, but lid=eng1:callcenter:notext is not.language
:notextImproved logic for LID decisions with low scores.
When LID scoring is below the decision threshold, the ASR engine will transcribe the audio with the language model specified by the model tag (or the default model for the ASR configuration if
model
is not explicitly provided). The results are indicated by alidinfo.langfinal
element in the JSON output.Improved startup handling if dependencies are missing.
Made additions to JSON output.
lidinfo
- now provided at utterance-level whenlidutt=true
langinfo
- breakdown of language information that is added when there was more than one language detected.langfinal
- added when the language specified in LID is below threshold and not the default language.
For more information on LID and using these parameters, refer to Receiving Language Identification Information.
Added new debugging parameters and JSON elements to assist with improved warnings and logging when using substitutions.
New debugging parameters
Warning
These parameters are intended for debugging purposes only and should not be used in production.
Table 2. New substitution debugging parametersName
Values
Description
subst
true, false (default), none
The
subst
parameter can be used to enable or disable automatic system- and model-level substitutions.subst=true
Enables system- and model-level substitutions
subst=false
Disables system-level substitutions; model-level substitutions still apply
subst=none
Disables both system- and model-level substitutions
substinfo
true, false (default)
Provides substitution details in JSON transcripts.
Set
substinfo
to true to include a top-level JSON object that indicates the applied substitution rules and a number count for each rule.In addition to the top-level JSON object,
substinfo
includes another JSON object in the metadata that details each substitution's location, the substitution rule applied, and the substitution rule source.For more details on these and other parameters, refer to the Substitutions section of the V‑Blaze REST API.
Added a new JSON output element:
nsubs
shows a count of substitutions applied at both top-level and utterance levels. Whensubstinfo=true
,nsubs
will also includenumtrans
counts within thesubstinfo
array. Top-levelnsubs
does not includenumtrans
counts. Thensubs
element will not appear if no substitutions were applied.
For more information, refer to the JSON Output Reference.
Hinting is now supported for eng1 version 7 models. Hinting for version 5 models is no longer supported.
For more information on hinting support, refer to the English page of the Language Models Reference.
Realtime operation now defaults to off (
realtime=false
). Requests must specify therealtime=true
option to get per-utterance callbacks during processing.For more information, refer to Voice Activity Detection and Utterance Controls.
Made enhancements to ASR licensing selection logic. The ASR engine now more efficiently selects usage licenses that meet minimum requirements for a stream. These enhancements also increased the efficiency of model verification and validation processes.
Made minor enhancements and fixes to speech-to-text output processing (
textproc
), including:The system now preserves timestamps on backrefs in substitutions instead of interpolating.
Fixed inadvertent uppercase of English cased backrefs (for example,
/\1/
).Eliminated unexpected behavior of
pattern{min,max}
whenmin=0
.Improved negative frame/word warning.
Changed the maximum value for
uttmaxtime
to 150 seconds.Made minor improvements to Spanish time formatting.
Corrected the scope of emotion scoring to always score individual utterances.
Eliminated rare edge case decode failures.
Setting
punctuate=false
now generates English output in lowercase.
ASR v7.1+ Fixed Issue
Minor modification of vociserver behavior to allow vociwebapi to recreate required ephemeral ramfs directories after unexpected vociserver service restarts. Previously, rare service interruptions could occur when results were returned as zip files, such as when scrubbing audio. (voci-server-server-sw-7.1.1-3, released November 2020)