How to convert a wav file to a JSON file

Question

I have a WAV file called student, now I want to convert this to a text and download this text as a JSON file.

The WAV file (audio) has the following content "Hello, I'm Michel. I am a student of Georgian college"

The JSON file needs to have the above content as a string.

Basically, convert speech to text.

If you want raw bytes, `file_obj.read()` would do after opening the file in byte mode. — heemayl, Aug 18 '18 at 20:02
@heemayl - The following thread https://stackoverflow.com/questions/35529520/how-to-convert-a-wav-file-bytes-like-object does that . But, I am don't know how to convert that back to a json file — MJP, Aug 18 '18 at 20:05
This question sounds like *"I have a cat named Oscar, now I want to convert this to a dog and raise this dog as a poodle"* to me. How do you intend to convert a WAV file to JSON? Also, *why*? — Aran-Fey, Aug 18 '18 at 20:07
What do you want the JSON file to contain? Just an array of numbers from 0 to 255 for the raw bytes of the file? A dict representing the WAV header and an array of frames each of which is an array of numbers in the appropriate range representing the samples in the file? Something else? — abarnert, Aug 18 '18 at 20:08
@Aran-Fey I have edited the question. Yes, I want to convert speech to text. — MJP, Aug 18 '18 at 20:10

Sushant · Accepted Answer · 2018-08-18T21:11:47.020

Quite a lot of speech recognition softwares depend on HMM or Hidden Markov Model. This approach works on the assumption that a speech signal, when viewed on a short enough timescale (say, ten milliseconds), can be reasonably approximated as a stationary process - meaning, a process in which statistical properties do not change over time. The speech is divided into 10 mm fragments and is mapped to a vector of real numbers known as cepstral coefficients and then these vectors are matched to Phonemes. This is a very high overview of a typical speech recognition system.

Now, coming back to the requirement that you have, a little research would have brought you to libraries like -

Now using SpeechRecognition is as simple as (taken from source code and tried on my computer) -

import speech_recognition as sr
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

And voila, it works, in ten lines of code, thanks to amazing people developing these :)

Edit - You need to have PocketSphinx set up for this code to work.

For me, AUDIO_FILE = path.join(path.dirname(path.realpath('__file__')), "english.wav") worked. — MJP, Aug 19 '18 at 14:01

How to convert a wav file to a JSON file

1 Answers1