0

I have a large mp3 file(about 1.8GB), which I have to transcribe using wit.ai. Since I am working with wav files a lot, i converted it to wav file.

But since wit.ai's speech api can't take more than 10s long audio, I am planning to stream the file in chunks. But some how I am only getting reponse 400(bad request). I am not able to find out, what I am sending wrong. Following are the details:

headers = {'authorization': 'Bearer ' + wit_access_token,
         'Content-Type': 'audio/wav','Transfer-encoding': 'chunked'}
with open('meeting-record.wav', 'rb') as f:
    audio = f.read(2048)  # taken it any number
resp = requests.post(API_ENDPOINT, headers = headers,
                 data = audio)
print(resp) 
data = json.loads(resp.content)
text = data['_text']
print(text)
f.close()

I am getting the following output

<Response [400]>
Traceback (most recent call last):
  File ".\sound-record.py", line 61, in <module>
    text = data['_text']
KeyError: '_text'

Can someone show some pointers, where its going wrong?

Surjya Narayana Padhi
  • 7,741
  • 25
  • 81
  • 130

2 Answers2

2

I haven't used the wit.ai API before, but the Bing Speech API appears to require the data in a similar fashion. I'm not sure if you were getting the error because of your code, but in order to properly chunk and stream the file, you could add another function in there like this:

def stream_audio_file(speech_file, chunk_size=1024):
    # Chunk audio file
    with open(speech_file, 'rb') as f:
        while 1:
            data = f.read(1024)
        if not data:
            break
        yield data

Now as long as you have that function somewhere in your file to stream and chunk the data for you, you can go back to your initial method:

headers = {
    'Accept': 'application/json',
    'Transfer-Encoding': 'chunked',
    'Content-type': 'audio/wav',
    'Authorization': 'Bearer {0}'.format(YOUR_AUTH_TOKEN)
}

data = stream_audio_file(YOUR_AUDIO_FILE)

r = requests.post(url, headers=headers, data=data)

results = json.loads(r.content)

print(results)

Side Note: You mentioned you wanted something on your own server. There's a nice module called pocketsphinx, which is free, hosted on your machine, and written in Python. It pairs really well with the SpeechRecognition module, which provides a decent layer on top so you don't have to spend as much time formatting your requests.

ERIC BARANOWSKI
  • 109
  • 1
  • 7
  • 1
    I added a more thorough answer once I realized I didn't answer his question with that first comment. – ERIC BARANOWSKI Aug 22 '18 at 08:30
  • I couldn't get the chunked type requests to work - was getting 400 Bad request error. What worked for me was saving every chunk of data into a temporary file, and then making the request. – Loner Dec 24 '21 at 18:54
0

Wit.ai is not supposed to transcribe long files, it is a system for recognizing short commands. You'd better use proper services:

And many others

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87