0

I am currently trying to train a DNN for speech recognition with large number of speakers, each having their own label (output_classes = Total number of speakers). My database right now has 300 speakers and the Keras model summary is as follows:

1240 (input size) --> 256 hidden --> 256 hidden --> 256 hidden (0.5 dropout) --> 256 (0.5 dropout) --> Softmax (300)

I am feeding the data in batches (each speaker data = one batch) and using the following optimizer:

model.compile(
    loss='categorical_crossentropy',
    optimizer='Adam',
    metrics=['accuracy'])

and fit_generator as follows:

model.fit_generator(
    trainGen(i=0),
    steps_per_epoch=num_class,
    epochs=500,
    validation_data=(x_test, y_test))

where trainGen is my generator function

While training, the cross-validation accuracy always settles to 0.0033 i.e. 1/300. The training and cross-validation losses are falling after each epoch though. Any suggestions?

grovina
  • 2,999
  • 19
  • 25
  • You are most probably overfitting. How many datapoints do you have? – Marcin Możejko Nov 11 '17 at 21:34
  • I am feeding frame level log-filter bank features to the network. On average there are about 2000 feature frames for each speaker. I suspect overfitting too, but regularization has not helped much. – user8917127 Nov 12 '17 at 19:00

1 Answers1

1

So, it turns out my network was overfitting massively as my database was too small. Adding more data and regularization finally helped in getting a decent accuracy.