Microsoft Speech API with Python requests?

Question

I'm trying to use the requests package in Python to make a call to the Microsoft Bing Speech Transcription API. I can make the call work when I use Postman, but this requires manually selecting a file to upload (Postman provides a GUI to select the file), but I'm not sure how this file selection gets mapped onto the actual HTTP request (and by extension the Python requests request). Postman can convert its internal queries into code, and according to Postman the http request it's making is:

POST /recognize?scenarios=smd&amp;appid=[REDACTED]&amp;locale=en-US&amp;device.os=wp7&amp;version=3.0&amp;format=json&amp;form=BCSSTT&amp;instanceid=[REDACTED]&amp;requestid=[REDACTED] HTTP/1.1
Host: speech.platform.bing.com
Authorization: [REDACTED]
Content-Type: application/x-www-form-urlencoded
Cache-Control: no-cache
Postman-Token: [REDACTED]

undefined

And the equivalent request if made through the Python requests library would be:

import requests

url = "https://speech.platform.bing.com/recognize"

querystring = {"scenarios":"smd","appid":[REDACTED],"locale":"en-US","device.os":"wp7","version":"3.0","format":"json","form":"BCSSTT","instanceid":[REDACTED],"requestid":[REDACTED]}

headers = {
'authorization': [REDACTED],
'content-type': "application/x-www-form-urlencoded",
'cache-control': "no-cache",
'postman-token': [REDACTED]
}

response = requests.request("POST", url, headers=headers, params=querystring)

print(response.text)

However note that in neither case does the generated code actually pass in the audio file to be transcribed (clearly Postman doesn't know how to display raw audio data), so I'm not sure how to add this crucial information to the request. I assume that in the case of the HTTP request code the audio stream goes in the spot displayed as "undefined". In the Python requests command, from reading the documentation it seems like the response = requests.request(...) line should be replaced by:

response = requests.request("POST", url, headers=headers, params=querystring, files={'file': open('PATH/TO/AUDIO/FILE', 'rb')})

But when I run this query I get "Request timed out (> 14000 ms)". Any idea for how I can successfully call the Microsoft Speech API through Python? Any help would be much appreciated, thanks.

score 1 · Answer 1 · answered May 10 '17 at 03:58

Make this line your post request:

r = requests.post(url, headers=headers, params=querystring, data=open('PATH/TO/WAV/FILE', 'rb').read())

And that should do the trick.

In the Microsoft Documentation, the audio file binary data is the body of the POST request and must be sent using the data parameter of the requests library.

Microsoft Speech API with Python requests?

1 Answers1