V-Spark Online Help

Using the /transcribe API with AWS S3

The Amazon Web Service (AWS) Simple Storage Service (S3) is a common location for archiving audio files and metadata together in zip files. If you already have such files stored in S3, you can use the /transcribe API's support for S3 to process them from that location, which can save upload time because V‑Spark typically must upload your files to S3 for processing. In order to use the /transcribe API with S3 from the command line, you must pass your AWS_ACCESS_KEY (referred to in the /transcribe API as your aws_id) and AWS_SECRET_KEY (referred to as your aws_secret) by using the curl command's support for filling in forms.

The following is the general format of a cURL command that calls the /transcribe API to transcribe a file or directory that is stored in S3:

curl -F token=AUTH_TOKEN \
     -F aws_id=AWS_ACCESS_KEY \
     -F aws_secret=AWS_SECRET_KEY \
     -F s3key=s3://BUCKET/path/to/file/or/directory \
     -F region=S3_REGION \
     -X POST http://SERVER/transcribe/ORG_SHORT/FOLDER

The user-specific fields that you need to provide are the following:

AUTH_TOKEN

The authorization token that you are using to retrieve information. Locating an authorization token for the company associated with the folder that you are uploading to is shown in V‑Spark API Permission Requirements.

AWS_ACCESS_KEY

The Amazon key for the bucket in which the file that you want to transcribe is stored

AWS_SECRET_KEY

The secret Amazon key for the bucket in which the file that you want to transcribe is stored

BUCKET

The Amazon S3 bucket in which the file that you want to transcribe is stored

path/to/file/or/directory

The path to the file that you want to process, a zip file that contains the audio file that you want to process (and an optional metadata file), or to a directory that contains a hierarchy of files that you want to process. If you specify a directory, all of the files that are located under that directory will be queued for transcription. Files that are submitted for processing but which are not in a format that is supported by V‑Spark will not be processed and will be listed in the V‑Spark folder's process log as being UNSUPPORTED.

S3_REGION

You must specify the Amazon S3 region of the S3 bucket. The region option on the request specifies which regional endpoint to use for the request.

This option reduces request latency and is required.

SERVER

The name or IP address of the computer system on which V‑Spark is installed

ORG_SHORT

The short name of the organization that you are using. Finding that information is shown in /transcribe API Reference.

FOLDER

The V‑Spark folder in which you want the transcript and audio output that is produced by V‑Spark to be stored.

The following is a specific example of calling the /transcribe API to transcribe a zip file that is stored in S3:

curl -F token=0123456789abcde0123456789abcde01 \
     -F aws_id=012345678901234567890 \
     -F aws_secret=01234567890123456789012345678901234567890 \
     -F s3key=s3://example.company.com/documentation-TEST.zip \
     -F region=us-east-1 \
     -X POST http://example.company.com/transcribe/Test-Testing/Test01

9700fc31-f608-48b5-aaaf-bd264e811d9a

This example transcribes the audio in the zip file named documentation-TEST.zip in the bucket example.company.com and puts the results of that transcription in the Test01 folder of the organization Test-Testing. As with other calls to the /transcribe API, it returns the request ID for your transcription request, which you can subsequently use with the /request API, as discussed in Reference for the /request API.

By default, any zip file in S3 that you have identified for transcription using the /transcribe API remains stored on S3 after its contents have been transcribed. Keeping such files in S3 after their content has been transcribed may not be necessary, so the /transcribe API includes a "delete=true" option that you can pass to delete a file after its content has been transcribed. In an application, you would pass this as an additional parameter to the /transcribe API call. In a curl command, you would add the -F delete=true option to your command line.