I'm currently implementing Vosk Speech recognition into an application. Looking specifically at the speaker recognition, I've implemented the test_speaker.py from the examples and it is functional. Being new to this, how can I identify and/or create the reference speaker signature? Using the one provided, the list of distances calculated with my audio example doesn't portray the two speakers involved:
[1.0182311997728735, 0.8679279016022726, 0.8552687907177629, 1.0258941854519696, 0.8666933753723253, 0.9291881495586336, 1.0316585805917928, 1.0227699471036409, 0.8442800102809634, 0.9093189414477789, 0.9153723223264221, 0.9705387223260904, 0.9077720598812595, 0.9524431272217568, 0.9179475137290445]
If there is not an effective way to calculate a reference speaker from within the audio under analysis, do you know of another solution that can be used with Vosk to identify speakers in an audio file? If not, what other speech to text option would you suggest? (I've already played with google's)
Thanks in advance