How can I extract and store the text generated from an automatic speech recognition deep learning app

Question

The app can be viewed in huggingface https://huggingface.co/spaces/rowel/asr

import gradio as gr
from transformers import pipeline


model = pipeline(task="automatic-speech-recognition",
                 model="facebook/s2t-medium-librispeech-asr")
gr.Interface.from_pipeline(model,
                           title="Automatic Speech Recognition (ASR)",
                           description="Using pipeline with Facebook S2T for ASR.",
                           examples=['data/ljspeech.wav',]
                           ).launch()

I don't know where the text files are stored with that very few lines of code. I would like to store the sentence text in a string.

Honestly I only know basic python programming. I would just like to store them into string variables and do something with them.

score 0 · Answer 1 · answered May 03 '22 at 05:36

You can open up the Interface.from_pipeline abstraction, and define your own Gradio interface. You need to define your own inputs, outputs, and prediction function, thus accessing the text prediction from the model. Here is an example.

You can test is here https://huggingface.co/spaces/radames/Speech-Recognition-Example


import gradio as gr
from transformers import pipeline


model = pipeline(task="automatic-speech-recognition",
                 model="facebook/s2t-medium-librispeech-asr")


def predict_speech_to_text(audio):
    prediction = model(audio)
    # text variable contains your voice-to-text string
    text = prediction['text']
    return text


gr.Interface(fn=predict_speech_to_text,
             title="Automatic Speech Recognition (ASR)",
             inputs=gr.inputs.Audio(
                 source="microphone", type="filepath", label="Input"),
             outputs=gr.outputs.Textbox(label="Output"),
             description="Using pipeline with F acebook S2T for ASR.",
             examples=['ljspeech.wav'],
             allow_flagging='never'
             ).launch()

How can I extract and store the text generated from an automatic speech recognition deep learning app

1 Answers1