-3

I have a WAV file called student, now I want to convert this to a text and download this text as a JSON file.

The WAV file (audio) has the following content "Hello, I'm Michel. I am a student of Georgian college"

The JSON file needs to have the above content as a string.

Basically, convert speech to text.

MJP
  • 1,547
  • 2
  • 14
  • 21
  • If you want raw bytes, `file_obj.read()` would do after opening the file in byte mode. – heemayl Aug 18 '18 at 20:02
  • @heemayl - The following thread https://stackoverflow.com/questions/35529520/how-to-convert-a-wav-file-bytes-like-object does that . But, I am don't know how to convert that back to a json file – MJP Aug 18 '18 at 20:05
  • 1
    This question sounds like *"I have a cat named Oscar, now I want to convert this to a dog and raise this dog as a poodle"* to me. How do you intend to convert a WAV file to JSON? Also, *why*? – Aran-Fey Aug 18 '18 at 20:07
  • What do you want the JSON file to contain? Just an array of numbers from 0 to 255 for the raw bytes of the file? A dict representing the WAV header and an array of frames each of which is an array of numbers in the appropriate range representing the samples in the file? Something else? – abarnert Aug 18 '18 at 20:08
  • Are you asking how to do Speech to Text? – Aran-Fey Aug 18 '18 at 20:10
  • @Aran-Fey I have edited the question. Yes, I want to convert speech to text. – MJP Aug 18 '18 at 20:10
  • @abarnert I have edited the question – MJP Aug 18 '18 at 20:12

1 Answers1

2

Quite a lot of speech recognition softwares depend on HMM or Hidden Markov Model. This approach works on the assumption that a speech signal, when viewed on a short enough timescale (say, ten milliseconds), can be reasonably approximated as a stationary process - meaning, a process in which statistical properties do not change over time. The speech is divided into 10 mm fragments and is mapped to a vector of real numbers known as cepstral coefficients and then these vectors are matched to Phonemes. This is a very high overview of a typical speech recognition system.

Now, coming back to the requirement that you have, a little research would have brought you to libraries like -

Now using SpeechRecognition is as simple as (taken from source code and tried on my computer) -

import speech_recognition as sr
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

And voila, it works, in ten lines of code, thanks to amazing people developing these :)

Edit - You need to have PocketSphinx set up for this code to work.

Sushant
  • 3,499
  • 3
  • 17
  • 34
  • For me, AUDIO_FILE = path.join(path.dirname(path.realpath('__file__')), "english.wav") worked. – MJP Aug 19 '18 at 14:01