1

I currently have reproducibility issues, although I set the seeds. I know that the model is initialized the same way (checked via inspection of model.save("initial.h5") with h5dump and meld).

The next thing for me to check if the training samples are used in the same order. Hence I would like to log them.

I train via

model.fit(dataset['train']['X'],
          dataset['train']['y'],
          epochs=cfg['model']['nb_epochs'],
          batch_size=cfg['model']['batch_size'],
          validation_split=cfg['model']['validation_split'],
          callbacks=[checkpoint], class_weight=cw)

I can also add dataset['train']['id']. I would like to get a txt file which contains the list of IDs being used, e.g. for a batch-size of 32, a training dataset length of 765 and 5 epochs I would expect 765 * 5 = 3825 lines in the txt file where each ID roughly appears 5 times and the first 32 elements are the IDs from the first batch.

Is that possible?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • What data structure is your `dataset`? It is a defaultdict, a pandas dataframe or something else? – sdcbr Aug 07 '18 at 13:36
  • You can train with `shuffle=False`? – Daniel Möller Aug 07 '18 at 13:37
  • dataset is a dict: `dataset = {'train': {'X': ndarray, 'y': ndarray}, 'test': {'X': ndarray, 'y': ndarray}}` - why is that important? – Martin Thoma Aug 07 '18 at 13:42
  • Even with `shuffle=False` I get differences – Martin Thoma Aug 07 '18 at 13:45
  • "Differences"? If you can't check, how can you tell there are differences? – Daniel Möller Aug 07 '18 at 13:54
  • Ah, I've just seen that 'X' is a pandas dataframe and `y` is a numpy array. Does that matter? – Martin Thoma Aug 07 '18 at 13:54
  • @DanielMöller Different models as a result. I don't know if the order of training samples is different. But as the set of training samples is the same and the initial model is the same, and the training algorithm is the same, I think it has to be the order of training samples, right? With this question I only want to see it so that I can make sure I'm at the right point – Martin Thoma Aug 07 '18 at 13:56
  • Ok, I found out how to make results reproducible in my case: I need to create a tensorflow session and tell Keras about it: https://stackoverflow.com/a/51715574/562769 - but this question is still open – Martin Thoma Aug 07 '18 at 14:17

0 Answers0