0

I am studying ASR(Automatic Speech Recognition) using Wav2Vec2.0. When I run Wav2Vec2.0, I get the result without a comma("."), question mark("?") etc. Therefore, the result came out as one whole sentence. I know that I removed regex while making the tokenizer. Is there any way to convert to the perfect sentence which contains regex?

Original Text from wav file = "So what which one is better?"

Wav2Vec 2.0 Result = "SO WHAT WHICH ONE IS BETTER" (Question mark missing)

Expected Result = "SO WHAT WHICH ONE IS BETTER?"

Giseok Ryu
  • 15
  • 3

1 Answers1

0

Most of the ASR are trained on open source datasets and all them has remove all kinf punctuation from it. If you like to have punctuation in the final output. Try to pass ASR output into following code.

from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "flexudy/t5-small-wav2vec2-grammar-fixer"

tokenizer = T5Tokenizer.from_pretrained(model_name)

model = T5ForConditionalGeneration.from_pretrained(model_name)

sent = """WHEN ARE YOU COMING TOMORROW I AM ASKING BECAUSE OF THE MONEY YOU OWE ME PLEASE GIVE IT TO ME I AM WAITING YOU HAVE BEEN AVOIDING ME SINCE TWO THOUSAND AND THREE"""

input_text = "fix: { " + sent + " } </s>"

input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=256, truncation=True, add_special_tokens=True)

outputs = model.generate(
    input_ids=input_ids,
    max_length=256,
    num_beams=4,
    repetition_penalty=1.0,
    length_penalty=1.0,
    early_stopping=True
)

sentence = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

print(f"{sentence}")

You can see following results as a output.

When are you coming tomorrow? I am asking because of the money you owe me, please give it to me. I am waiting. You have been avoiding me since 2003.

For better understanding check this model on HuggingFace.

https://huggingface.co/flexudy/t5-small-wav2vec2-grammar-fixer

Swapnil Pote
  • 136
  • 7