Speech Transcription¶

A basic speech transcription feature is available as part of the Redactor API. Adding the SPEECH_TRANSCRIPTION feature to the request, either by itself or with along with other features, will analyze the video and attempt to detect the words spoken. The speech transcript will be available in the Redactor UI for use in the editor and is also available as a JSON file as part of the API.

To output the speech.json file, set outputContext.speechTranscriptionData: true in your request body, and it will be saved along with the other standard redaction state data. Be sure to provide a path to a folder in the outputUri, and ensure the path ends in a trailing slash.

curlJSON Body

curl --location --request POST 'http://localhost:9000/api/v1/videos:process' --header 'Content-Type: application/json'  --data-raw '{
    "inputUri": "https://example.com/path/to/input.mp4",
    "features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
    "outputUri": "file:///path/to/output/folder/",
    "outputContext": {
        "speechTranscriptionData": true
    }
}'

{
    "inputUri": "https://example.com/path/to/input.mp4",
    "features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
    "outputUri": "file:///path/to/output/folder/",
    "outputContext": {
        "speechTranscriptionData": true
    }
}

By default, English is used for the transcription. If a different language is being spoken in the media, specify that language code in videoContext.speechTranscriptionConfig.languageCode. The currently available language codes are:

en-us - English (default if none specified)
cn - 中文 (Chinese)
de - Deutsch (German)
es - Español (Spanish)
fr - Français (French)
it - Italiano (Italian)
ja - 日本語 (Japanese)
nl - Nederlands (Dutch)
pt - Português (Portuguese)
ru - Русский (Russian)

curlJSON Body

curl --location --request POST 'http://localhost:9000/api/v1/videos:process' --header 'Content-Type: application/json'  --data-raw '{
    "inputUri": "https://example.com/path/to/input.mp4",
    "features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
    "outputUri": "file:///path/to/output/folder/",
    "outputContext": {
        "speechTranscriptionData": true
    },
    "videoContext": {
        "speechTranscriptionConfig": {
            "languageCode": "es"
        }
    }
}'

{
    "inputUri": "https://example.com/path/to/input.mp4",
    "features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
    "outputUri": "file:///path/to/output/folder/",
    "outputContext": {
        "speechTranscriptionData": true
    },
    "videoContext": {
        "speechTranscriptionConfig": {
            "languageCode": "es"
        }
    }
}

If a speech transcript is all that's needed, this can be obtained without performing any redactions on the media. In the following example, we are only running the SPEECH_TRANSCRIPTION feature and configuring the outputContext to save the transcription data and to skip saving the full redaction state. The speech.json will ultimately be saved to the output location along with some minimal metadata. Note that this additional metadata may not be output by default in future releases.

curlJSON Body

curl --location --request POST 'http://localhost:9000/api/v1/videos:process' --header 'Content-Type: application/json'  --data-raw '{
    "inputUri": "https://example.com/path/to/input.mp4",
    "features": ["SPEECH_TRANSCRIPTION"],
    "outputUri": "file:///path/to/output/folder/",
    "outputContext": {
        "speechTranscriptionData": true,
        "state": false
    },
    "videoContext": {
        "speechTranscriptionConfig": {
            "languageCode": "en-us"
        }
    }
}'

{
    "inputUri": "https://example.com/path/to/input.mp4",
    "features": ["SPEECH_TRANSCRIPTION"],
    "outputUri": "file:///path/to/output/folder/",
    "outputContext": {
        "speechTranscriptionData": true,
        "state": false
    },
    "videoContext": {
        "speechTranscriptionConfig": {
            "languageCode": "en-us"
        }
    }
}