Speech Transcription¶
A basic speech transcription feature is available as part of the Redactor API. Adding the SPEECH_TRANSCRIPTION
feature to the request, either by itself or with along with other features, will analyze the video and attempt to detect the words spoken. The speech transcript will be available in the Redactor UI for use in the editor and is also available as a JSON file as part of the API.
To output the speech.json
file, set outputContext.speechTranscriptionData: true
in your request body, and it will be saved along with the other standard redaction state data. Be sure to provide a path to a folder in the outputUri
, and ensure the path ends in a trailing slash.
curl --location --request POST 'http://localhost:9000/api/v1/videos:process' --header 'Content-Type: application/json' --data-raw '{
"inputUri": "https://example.com/path/to/input.mp4",
"features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
"outputUri": "file:///path/to/output/folder/",
"outputContext": {
"speechTranscriptionData": true
}
}'
By default, English is used for the transcription. If a different language is being spoken in the media, specify that language code in videoContext.speechTranscriptionConfig.languageCode
. The currently available language codes are:
en-us
- English (default if none specified)cn
- 中文 (Chinese)de
- Deutsch (German)es
- Español (Spanish)fr
- Français (French)it
- Italiano (Italian)ja
- 日本語 (Japanese)nl
- Nederlands (Dutch)pt
- Português (Portuguese)ru
- Русский (Russian)
curl --location --request POST 'http://localhost:9000/api/v1/videos:process' --header 'Content-Type: application/json' --data-raw '{
"inputUri": "https://example.com/path/to/input.mp4",
"features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
"outputUri": "file:///path/to/output/folder/",
"outputContext": {
"speechTranscriptionData": true
},
"videoContext": {
"speechTranscriptionConfig": {
"languageCode": "es"
}
}
}'
{
"inputUri": "https://example.com/path/to/input.mp4",
"features": ["HEAD_DETECTION", "SPEECH_TRANSCRIPTION", "MEDIA_RENDERING"],
"outputUri": "file:///path/to/output/folder/",
"outputContext": {
"speechTranscriptionData": true
},
"videoContext": {
"speechTranscriptionConfig": {
"languageCode": "es"
}
}
}
If a speech transcript is all that's needed, this can be obtained without performing any redactions on the media. In the following example, we are only running the SPEECH_TRANSCRIPTION
feature and configuring the outputContext
to save the transcription data and to skip saving the full redaction state. The speech.json
will ultimately be saved to the output location along with some minimal metadata. Note that this additional metadata may not be output by default in future releases.
curl --location --request POST 'http://localhost:9000/api/v1/videos:process' --header 'Content-Type: application/json' --data-raw '{
"inputUri": "https://example.com/path/to/input.mp4",
"features": ["SPEECH_TRANSCRIPTION"],
"outputUri": "file:///path/to/output/folder/",
"outputContext": {
"speechTranscriptionData": true,
"state": false
},
"videoContext": {
"speechTranscriptionConfig": {
"languageCode": "en-us"
}
}
}'