This is a problem I ran into using the Google Speech to Text Engine. I am currently streaming 16 bit / 16 kHz audio real time in 32kB chunks. But there is an average 25 second latency between sending audio and receiving transcripts, defeating the purpose of real time transcription.
Why is there such high latency?