I am currently trying to train a DNN for speech recognition with large number of speakers, each having their own label (output_classes = Total number of speakers). My database right now has 300 speakers and the Keras model summary is as follows:
1240 (input size) --> 256 hidden --> 256 hidden --> 256 hidden (0.5 dropout) --> 256 (0.5 dropout) --> Softmax (300)
I am feeding the data in batches (each speaker data = one batch) and using the following optimizer:
model.compile(
loss='categorical_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
and fit_generator
as follows:
model.fit_generator(
trainGen(i=0),
steps_per_epoch=num_class,
epochs=500,
validation_data=(x_test, y_test))
where trainGen
is my generator function
While training, the cross-validation accuracy always settles to 0.0033 i.e. 1/300. The training and cross-validation losses are falling after each epoch though. Any suggestions?