Processing and performance

Voci's ASR performance, also known as throughput, is measured by the total duration of audio transcribed per hour. The specifications of the system V‑Blaze operates on has a significant impact on throughput. For example, V‑Blaze has higher throughput running on a system with a 16 core CPU, Nvidia GPU module, and 128 GB RAM than it would on a system with an 8 core CPU and no GPU.

Throughput varies based on language models, optional ASR features, and audio characteristics. Concurrent use of multiple models and languages on a single instance is supported, but the available model configuration must match the resources available (primarily RAM and GPU memory).

V‑Blaze runs on wide variety of configurations. However, the primary factors for choosing a deployment configuration are operational cost and availability. Calculate operational costs by taking the system cost (price per hour) and divide it by the observed throughput (duration of audio transcribed per hour).

For more information on processing and performance, refer to V‑Blaze standard system configurations or contact support@vocitec.com to determine the most suitable options for your needs.

Storage

V‑Blaze does not store any request data; therefore, system storage requirements are minimal and limited only to the software and models. System storage space required for V‑Blaze software is less than 10 GB. Additionally, each ASR model can require up to 5 GB of storage. Typical system configurations range from 20 GB to 250 GB. SSD storage is recommended for quicker boot and load times.

Note: In deployments with a large number of ASR instances, Voci recommends storing all models (/opt/voci/models) on a central shared network drive for performance and maintenance. In AWS deployments, this is typically done using a mounted remote EFS (Elastic File System).

Using V‑Blaze with AWS

Voci support recommends using GPU-enhanced instances to deploy Voci V‑Blaze on AWS. The g4.4xlarge instance type provides optimal price, performance, and scalability for Voci V‑Blaze, and g3.4xlarge is also supported.

For non-GPU instances, optimal sizing depends on other factors like the number of language models installed. Contact support@vocitec.com for recommendations.

V‑Blaze standard system configurations

Table 1. V‑Blaze AWS development system configurations — 12/01/2023

Configuration

Nominal Throughput (audio processed per hour)

System Type

AWS instance type

Optional Configuration

Notes

V‑Blaze — AWS standard

200 hrs

Virtual (GPU)

g3.4xlarge, g4dn.4xlarge, g5.4xlarge

GPU hosts can process approximately 200 hours of audio per hour.

V‑Blaze — AWS minimal development

50 hrs

Virtual (no GPU)

m4.2xlarge, m5.2xlarge, m6.2xlarge, m7.2xlarge

Non-GPU hosts can process approximately 50 hours of audio per hour. This is a cost-effective host for development with customer data.

Table 2. V‑Blaze standard system configurations — 01/01/2021

Configuration

Nominal Throughput (audio processed per hour)

System Type

Hardware Specifications

Optional Configuration

Notes

V‑Blaze — High Volume

1000+ hrs

1U Server

CPU: 2x Intel Xeon Gold 6248R

GPU: 1x NVIDIA A100 (40 GB)

RAM: 384 GB

Storage: 2x 250 GB SSD in RAID1

2x 2TB HDD

Additional storage is necessary if a single ASR host is used for Direct-to-Transcript, V‑Spark Analytics, or audio storage.

V‑Blaze — Low Volume

350 hrs

1U Server

CPU: 2x Intel Xeon Gold 6226R

GPU:1x NVIDIA T4 (16 GB)

RAM: 192 GB

Storage: 2x 250GB SSD in RAID1

2x 2TB HDD

Additional storage is necessary if a single ASR host is used for Direct-to-Transcript, V‑Spark Analytics, or audio storage.

V‑Blaze — AWS Standard

200 hrs

EC2 g4dn.4xlarge

CPU: 16x vCPU (Intel Xeon E5-2686 v4 Cascade Lake)

GPU: 1x NVIDIA T4 (16 GB)

RAM: 64 GB

Storage: 20 GB EBS gp2

Alternate AWS instances include g3.4xlarge or any larger g3/g4 instance. Voci recommends .4xl instances for the best value and scaling.

V‑Blaze — Minimal Virtual (no GPU)

50 hrs

EC2 m5.2xlarge or equivalent VM instance

CPU: 8x vCPU (Intel Xeon Platinum 8259CL)

GPU: N/A

RAM: 32

Storage: 20 GB EBS gp2

Voci does not recommend this configuration for production. However, a nominal throughput of up to 50 hours of audio per hour is possible.