I am thinking about building the code more efficiently. I am using Discord JDA and the Microsoft Azure speech service. Is it possible to recognize speech directly from bytes, not from a file? I mean, skipping writing bytes to a temporary file and then recognizing the file. Or maybe in some other, better way I can do it? The current method seems inappropriate to me.
My AudioReceiveHandler:
@Override
public void handleUserAudio(@NotNull UserAudio userAudio) {
User user = userAudio.getUser();
if (!BYTES.containsKey(user))
BYTES.put(user, new ArrayList<>());
ArrayList<byte[]> userBytes = BYTES.get(user);
userBytes.add(userAudio.getAudioData(1));
}
Converting speech to text:
private void read(ArrayList<byte[]> userBytes) {
int length = 0;
for (byte[] bytes : userBytes) {
length += bytes.length;
}
byte[] decodedData = new byte[length];
int i = 0;
for (byte[] bytes : userBytes) {
for (byte sampleByte : bytes) {
decodedData[i++] = sampleByte;
}
}
String filePath = "[...]/temp.wav";
try {
AudioSystem.write(new AudioInputStream(new ByteArrayInputStream(decodedData),
new AudioFormat(48000f, 16, 2, true, true), decodedData.length),
AudioFileFormat.Type.WAVE,
new File(filePath));
} catch (IOException exception) {
exception.printStackTrace();
}
SpeechConfig speechConfig = SpeechConfig.fromSubscription("-", "-");
speechConfig.setSpeechRecognitionLanguage("pl-PL");
AudioConfig audioConfig = AudioConfig.fromWavFileInput(filePath);
SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig);
Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
try {
SpeechRecognitionResult result = task.get();
Logger.info("RECOGNIZED: " + result.getText());
} catch (Exception exception) {
exception.printStackTrace();
}
}