Most of the ASR are trained on open source datasets and all them has remove all kinf punctuation from it. If you like to have punctuation in the final output. Try to pass ASR output into following code.
from transformers import T5Tokenizer, T5ForConditionalGeneration
model_name = "flexudy/t5-small-wav2vec2-grammar-fixer"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
sent = """WHEN ARE YOU COMING TOMORROW I AM ASKING BECAUSE OF THE MONEY YOU OWE ME PLEASE GIVE IT TO ME I AM WAITING YOU HAVE BEEN AVOIDING ME SINCE TWO THOUSAND AND THREE"""
input_text = "fix: { " + sent + " } </s>"
input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=256, truncation=True, add_special_tokens=True)
outputs = model.generate(
input_ids=input_ids,
max_length=256,
num_beams=4,
repetition_penalty=1.0,
length_penalty=1.0,
early_stopping=True
)
sentence = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
print(f"{sentence}")
You can see following results as a output.
When are you coming tomorrow? I am asking because of the money you owe me, please give it to me. I am waiting. You have been avoiding me since 2003.
For better understanding check this model on HuggingFace.
https://huggingface.co/flexudy/t5-small-wav2vec2-grammar-fixer