I am using MS Translator Speech WebSocket API for real-time speech recognition and translation. The problem is that sometimes the recognised text does not have punctuation (commas, full stops, etc.). The transcribed text looks good otherwise. I also receive an MP3 with synthesised translation.
It looks completely random, I can send the same audio multiple times and some responses have punctuation and some do not. I am sending the audio in correct format and in near real-time rate e.g. I send 100ms samples every ~100ms. The recognised language is Spanish.
Is this a common issue or is there some other catch?