0

I'm trying to use wit.ai to understand intent and entities in a voice command received from user in a Telegram bot.

def discover(bot, update, user_data):

   voice = bot.getFile(update.message.voice.file_id)
   voice.download('file.ogg')

   client = Wit(wit_access_token)

   with open('file.ogg', 'rb') as f:
       resp = client.speech(f, True, {'Content-Type': 'audio/ogg'})
   print('Yay, got Wit.ai response: ' + str(resp))

But, I'm receiving this error from Wit client:

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals)  # execute the script
File "C:/Users/PAGANEFR/PycharmProjects/MARCoBot/readaudio.py", line 8, in <module>
resp = client.speech(f, True, {'Content-Type': 'audio/ogg'})
File "C:\Python27\lib\site-packages\wit\wit.py", line 88, in speech
data=audio_file, headers=headers)
File "C:\Python27\lib\site-packages\wit\wit.py", line 41, in req
' (' + rsp.reason + ')')
wit.wit.WitError: Wit responded with status: 400 (Bad Request)

I can play ogg file with VLC. File seems consistent. I have tried to convert ogg file to wav with soundfile library:

data, samplerate = sf.read('file.ogg')
sf.write('file.wav', data, samplerate)

But I'm receiving this error:

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals)  # execute the script
File "C:/Users/PAGANEFR/PycharmProjects/MARCoBot/readaudio.py", line 6, in <module>
data = sf.read('file.ogg')
File "C:\Python27\lib\site-packages\soundfile.py", line 257, in read
subtype, endian, format, closefd) as f:
File "C:\Python27\lib\site-packages\soundfile.py", line 624, in __init__
self._file = self._open(file, mode_int, closefd)
File "C:\Python27\lib\site-packages\soundfile.py", line 1179, in _open
"Error opening {0!r}: ".format(self.name))
File "C:\Python27\lib\site-packages\soundfile.py", line 1352, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'file.ogg': File contains data in an unimplemented format.

Please help me. Thanks in advance

Francesco
  • 352
  • 1
  • 8
  • 19
  • What did you searched so far? – Sean Wei Apr 15 '18 at 11:07
  • I would like a function where I pass audio file and it returns the text (speech to text) where is specified the action (intent) and attributes (entities). Example: audio="I want order a pizza with onion and tuna"; action is "order" and attributes are pizza, tuna and onion. – Francesco Apr 15 '18 at 21:34

1 Answers1

0

I have read this on sungjin han' blog but it's golang maybe helpful, You could see this part, He has used ffmpeg convertor, but it's on golang language... His github.

func speechToText(w *witai.Client, fileUrl string) (text string, err error) {
    var oggFilepath, mp3Filepath string
    // download .ogg,
    if oggFilepath, err = downloadFile(fileUrl); err == nil {
        // .ogg => .mp3,
        if mp3Filepath, err = oggToMp3(oggFilepath); err == nil {
            // .mp3 => text
            if result, err := w.QuerySpeechMp3(mp3Filepath, nil, "", "", 1); err == nil {
                log.Printf("> analyzed speech result: %+v\n", result)
                if result.Text != nil {
                    text = fmt.Sprintf("\"%s\"", *result.Text)
                    /*
                        // traverse for more info
                        sessionId := "01234567890abcdef"
                        if results, err := w.ConverseAll(sessionId, *result.Text, nil); err == nil {
                            for i, r := range results {
                                log.Printf("> converse[%d] result: %v\n", i, r)
                            }
                        } else {
                            log.Printf("failed to converse: %s\n", err)
                        }
                    */
                }
            }
            // delete converted file
            if err = os.Remove(mp3Filepath); err != nil {
                log.Printf("*** failed to delete converted file: %s\n", mp3Filepath)
            }
        } else {
            log.Printf("*** failed to convert .ogg to .mp3: %s\n", err)
        }
        // delete downloaded file
        if err = os.Remove(oggFilepath); err != nil {
            log.Printf("*** failed to delete downloaded file: %s\n", oggFilepath)
        }
    }
    return text, err
}
Partho63
  • 3,117
  • 2
  • 21
  • 39