I am currently developing a speaker recognition program which should recognize the speaker by listening the microphone. I'm a newbie at audio processing and machine learning, but I trained a neural network classifier for this project which only contains 3 different records right now.
The records I trained the model are recorded by different microphones, so while predicting the speaker machine gets confused. Is there any way that I can prevent it? Somehow with preprocessing the data? Right now, I just removed the silenced part of the audio records, and trained the model with those audio files.
Thanks for all replies.