0

I would like to use Azure Speech Services Batch Transcription APIs to create a transcription of my audio file. I've already had success using the Speech Service SDK (for Node.js), but was interested in trying out one of the newer features available in v3.1 preview version of the api (displayFormWordLevelTimestampsEnabled), so I figured I had to do use the REST API service to do that.

Overall my problem is that for whatever input I've feed the Create Transcript API for contentUrls, I always end up getting the same error:

"error": {
   "code": "InvalidData",
   "message": "The recordings URI contains invalid data."
}

After a little digging, I found some tips through the Azure portal to use sox to handle transcoding the audio file in the specific format requested.

The specific format they mention in the portal documentation shows: If you are using REST API, make sure that it uses one of the formats in this table:

Format Codec Bit rate Sample Rate
WAV PCM 256 kbps 16 kHz, mono
OGG OPUS 256 kpbs 16 kHz, mono

With the sox specific commands being:

Activity SoX command
Check the audio file format. sox --i
Convert the audio file to single channel, 16-bit, 16 KHz. sox -b 16 -e signed-integer -c 1 -r 16k -t wav .wav

I ran my mp3 through the second command and verified the file with the first, and the contents of the file looks like:

Input File     : 'out5.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:30.09 = 481488 samples ~ 2256.97 CDDA sectors
File Size      : 963k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Finally, I uploaded the file to a public S3 bucket, to use as my content url for my request:

POST https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions

{
  "contentUrls": [
        "https://s3.us-west-1.amazonaws.com/xxxx/out5.wav"
  ],
  "locale": "en-US",
  "displayName": "Test"
}

Still it failed with the same error that I posted above. Any insights into what might be wrong? Thanks!

Update:

The answer below mentioned being able to reference a reports.json file on the Get Transcript/Create Transcript api call.

When I use the Create Transcript API my payload is:

{
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
    "model": {
        "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
    },
    "links": {
        "files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
    },
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "channels": [
            0,
            1
        ],
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked"
    },
    "lastActionDateTime": "2022-09-13T23:37:09Z",
    "status": "NotStarted",
    "createdDateTime": "2022-09-13T23:37:09Z",
    "locale": "en-US",
    "displayName": "Test"
}

Calling the Get Transcript I see:

{
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
    "model": {
        "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
    },
    "links": {
        "files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
    },
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "channels": [
            0,
            1
        ],
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked",
        "error": {
            "code": "InvalidData",
            "message": "The recordings URI contains invalid data."
        }
    },
    "lastActionDateTime": "2022-09-13T23:37:22Z",
    "status": "Failed",
    "createdDateTime": "2022-09-13T23:37:09Z",
    "locale": "en-US",
    "displayName": "Test"
}

And finally looking at the transcript files I'm getting an empty list:

{
    "values": []
}

I see no reference to a reports.json, or any data populated here at all.

shanewwarren
  • 2,234
  • 22
  • 17
  • 1
    Hi Shane, thank you for posting. I've reached out to the appropriate speech service team to find the right person to answer your question. – Darren Cohen Sep 13 '22 at 16:30

1 Answers1

3

In many cases you can get a detailed error information by doing a GET on https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions/<transcription_id>/files and looking at the report.json that is referenced there.

If that doesn't help, you could post transcription id(s) of failed transcription so someone from the team (I am one of them) can look at the service logs.

chlandsi
  • 31
  • 2
  • Thank you, I saw in the documentation references to the `report.json`. But each time I looked at that `"files" { "link": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/xxxx/files" }` . I only ever got the following payload: `{ "values": [] }`. I will update my question. – shanewwarren Sep 13 '22 at 20:18
  • I updated my answer with the results of `Create Transcript` `Get Transcript` and `Get Transcript Files` – shanewwarren Sep 13 '22 at 23:39
  • 1
    @shanewarren There seems to be an issue with the speech resource you are using. I've asked the corresponding team to look into that. There was a change to the resource on 9/14 after you tried starting the transcriptions; for the moment you could either try with the resource again or create a new (paid/S0) resource and try with that. There is no issue with the input files; the error message is unfortunately misleading. – chlandsi Sep 15 '22 at 10:10
  • Thank you @chlandsi, I will try again later tonight given your recommended solutions. – shanewwarren Sep 15 '22 at 19:26