how to send chunked audio data for speech recognition in wit.ai?

Question

I have a large mp3 file(about 1.8GB), which I have to transcribe using wit.ai. Since I am working with wav files a lot, i converted it to wav file.

But since wit.ai's speech api can't take more than 10s long audio, I am planning to stream the file in chunks. But some how I am only getting reponse 400(bad request). I am not able to find out, what I am sending wrong. Following are the details:

headers = {'authorization': 'Bearer ' + wit_access_token,
         'Content-Type': 'audio/wav','Transfer-encoding': 'chunked'}
with open('meeting-record.wav', 'rb') as f:
    audio = f.read(2048)  # taken it any number
resp = requests.post(API_ENDPOINT, headers = headers,
                 data = audio)
print(resp) 
data = json.loads(resp.content)
text = data['_text']
print(text)
f.close()

I am getting the following output

<Response [400]>
Traceback (most recent call last):
  File ".\sound-record.py", line 61, in <module>
    text = data['_text']
KeyError: '_text'

Can someone show some pointers, where its going wrong?

ERIC BARANOWSKI · Answer 1 · 2018-08-22T08:23:53.290

I haven't used the wit.ai API before, but the Bing Speech API appears to require the data in a similar fashion. I'm not sure if you were getting the error because of your code, but in order to properly chunk and stream the file, you could add another function in there like this:

def stream_audio_file(speech_file, chunk_size=1024):
    # Chunk audio file
    with open(speech_file, 'rb') as f:
        while 1:
            data = f.read(1024)
        if not data:
            break
        yield data

Now as long as you have that function somewhere in your file to stream and chunk the data for you, you can go back to your initial method:

headers = {
    'Accept': 'application/json',
    'Transfer-Encoding': 'chunked',
    'Content-type': 'audio/wav',
    'Authorization': 'Bearer {0}'.format(YOUR_AUTH_TOKEN)
}

data = stream_audio_file(YOUR_AUDIO_FILE)

r = requests.post(url, headers=headers, data=data)

results = json.loads(r.content)

print(results)

Side Note: You mentioned you wanted something on your own server. There's a nice module called pocketsphinx, which is free, hosted on your machine, and written in Python. It pairs really well with the SpeechRecognition module, which provides a decent layer on top so you don't have to spend as much time formatting your requests.

I added a more thorough answer once I realized I didn't answer his question with that first comment. — ERIC BARANOWSKI, Aug 22 '18 at 08:30
I couldn't get the chunked type requests to work - was getting 400 Bad request error. What worked for me was saving every chunk of data into a temporary file, and then making the request. — Loner, Dec 24 '21 at 18:54

score 0 · Answer 2 · answered Apr 19 '18 at 06:34

0

Wit.ai is not supposed to transcribe long files, it is a system for recognizing short commands. You'd better use proper services:

And many others

answered Apr 19 '18 at 06:34

Nikolay Shmyrev

24,897
5
43
87

I need a free ASR which I can install on local network and train for a new language. – Surjya Narayana Padhi Apr 20 '18 at 07:13
You can not install wit.ai either. If you need local ASR you can use Kaldi. – Nikolay Shmyrev Apr 20 '18 at 07:33

how to send chunked audio data for speech recognition in wit.ai?

2 Answers2