I just noticed that the timestamps of SFTranscriptionSegments start at zero each minute, which makes it impossible to really know where the text is located if there are long pauses. Is this something that can be configured or worked around?
I am using SFSpeechRecognizer to transcribe audio files that are potentially longer than one minute. Chopping them into one-minute segments will have the danger of splitting words.
I am using SFSpeechRecognizer on Mac OS Catalina.