10

When I use model.predict_generator() on my test_set (images) I am getting a different prediction and when I use mode.predict() on the same test_Set I am getting a different set of predictions.

For using model.predict_generator I followed the below steps to create a generator:

  1. Imagedatagenerator(no arguments here) and used flow_from_directory with shuffle = False.
  2. There are no augmentations nor preprocessing of images(normalization,zero-centering etc) while training the model.

I am working on a binary classification problem involving dogs and cats (from kaggle).On the test set, I have 1000 cat images. and by using model.predict_generator() I am able to get 87% accuracy()i.e 870 images are classified correctly. But while using model.predict I am getting 83% accuracy.

This is confusing because both should give identical results right? Thanks in advance :)

Tejas Thakar
  • 585
  • 5
  • 19
Abhijit Balaji
  • 1,870
  • 4
  • 17
  • 40
  • 1
    are you using same models and can you share your code as well. – Sargam Modak Sep 13 '17 at 06:10
  • Have you made sure that predict_generator() yields exactly one epoch? Since Keras 2 the generators are step-based (see fchollet's comment here https://github.com/fchollet/keras/issues/5818) so you might have a different number of samples in your predictions. You can also reset generators to make sure you always start with sample #0. – petezurich Sep 13 '17 at 06:33
  • @petezurich I dont quite understand what you mean could you please provide a sample code? – Abhijit Balaji Sep 13 '17 at 08:10
  • @AbhijitBalaji I think it would be easier if you provided your code. :0) Right now we can only guess whats wrong. Apart from that: You can reset a generator with `your_image_generator.reset()` before you start to predict. – petezurich Sep 13 '17 at 13:22

1 Answers1

2

@petezurich Thanks for your comment. Generator.reset() before model.predict_generator() and turning off the shuffle in predict_generator() fixed the problem

Abhijit Balaji
  • 1,870
  • 4
  • 17
  • 40
  • Great. I am happy that you could solve your problem. – petezurich Sep 21 '17 at 05:39
  • @petezurich hey there seems to be a different problem with generators now. say suppose you have 7427 test images and you use a batch size of 16 in model.predict_generator. The output is only a vector of length 7424 and not 7427. Thus for it to work the batch_size must exactly divide the number of samples. thus for an input array of len 7427 you should use a batch size of 7 to get 7427 length array output from predict.generator() – Abhijit Balaji Oct 06 '17 at 09:34