-1

I am currently developing a speaker recognition program which should recognize the speaker by listening the microphone. I'm a newbie at audio processing and machine learning, but I trained a neural network classifier for this project which only contains 3 different records right now.

The records I trained the model are recorded by different microphones, so while predicting the speaker machine gets confused. Is there any way that I can prevent it? Somehow with preprocessing the data? Right now, I just removed the silenced part of the audio records, and trained the model with those audio files.

Thanks for all replies.

Arjein
  • 25
  • 8

1 Answers1

0

As a background as former Dolby Engineer I can tell you that you need vast amounts of data.

Having just tree recordings is not enough. There simply isn’t enough datapoints for training.

There is several things you should consider and research. Normalizing audio, applying filters and features. That means looking for frequency keys. These frequencies are key. And your training should be done with so many known clean/studio inputs as possible. That will help your model identify keys when background noise is introduced.

Christophermp
  • 176
  • 1
  • 3
  • 13
  • First of all, thanks for the reply @Christophermp. I have a question, I want to use it as a personal assistant, which only accepts my commands. Is it important to take the audio records with the microphone which I will usually use for speech commands. And what should be the approximate amount for train data. would researching the topics you mentioned be enough? Thanks for the reply, again! – Arjein Mar 14 '23 at 00:40
  • No problem @Arjein. Well hard to say. First establish a baseline. Then you should record your commands in several settings. You may sound the same for yourself every day. But for a computer you aren’t. That throws off your training. – Christophermp Mar 14 '23 at 00:44