What to do if I don't want to waste samples by flooring steps_per_epoch in model.fit?

Question

I have a model that has 536 training samples, and would like to run through all the samples per epoch. The batch size is 32, epoch is 50. Here is the code and its error:

results = model.fit(train_X, train_y, batch_size = 32, epochs = 50, validation_data=(val_X, val_y), callbacks=callbacks)

The dataset you passed contains 837 batches, but you passed epochs=50 and steps_per_epoch=17, which is a total of 850 steps. We cannot draw that many steps from this dataset. We suggest to set steps_per_epoch=16.

Total number of samples / batch size = steps per epoch = 536/32 = 16.75. The model.fit would work if I set steps per epoch = 16. Doesn't this mean I'm discarding 24 samples (0.75 * 32) per each epoch?

If yes, how can I not discard these samples? One way would be adjusting batch size to have no residual when diving # of samples by it.

If there are other ways, please enlighten me.

Do you need to use `Dataset`? Are you doing any on-the-fly changes to your training data? Are you using augmentation? If no to all of that, you don't need to set `steps_per_epoch`. You can always just exclude that parameter and see how it works. It'll probably work fine. Not setting it will at least make sure all input data is used. — Djinn, Jul 22 '22 at 02:26
@Djinn, The model.fit function outputs the error above even if I exclude the parameter steps_per_epoch. So the function won't work without explicitly flooring the parameter steps_per_epoch. For this case, is there any other way to force 32 batches for 16 steps per epoch and 24 batch for the rest samples? — Kay, Jul 22 '22 at 05:48
@Djinn, won't setting steps_per_epoch to 1 force the batch size up to 536 because # of samples / steps per epoch = 536 which is the batch size? Please enlighten me if I'm misunderstanding — Kay, Jul 22 '22 at 10:38
Yes. Or you can try setting it to number of samples. There's no set rule, you can play around with it. If you're using arrays or load everything in memory, you can avoid all of this really. — Djinn, Jul 22 '22 at 12:45
@Djinn. I see. Please let me check if I've got it correct Given 536 samples, training 512 of them with batch 32, and 24 (the rest of them) with batch size 24 is not possible. And so I have to discard the 24 samples. If I want not to discard any samples, then 1) I have to either set batch to 512%batch = 0 or set batch to 536 (or steps per epoch =1). Correct? — Kay, Jul 23 '22 at 03:22
You should also be able to set steps to total batch number. Either one will go through all the records. One does one sample at a time, the other loads them all at once. Training is *supposed* to do your 32 samples, then when it needs to do the rest of the 24, it'd just do those at the end. The formula to determine steps is more of a guideline on efficiency (and also a hyperparameter) than a set rule. — Djinn, Jul 23 '22 at 03:26
Your code suggests you're already using arrays (or data structure that's loaded in memory)? Since you're passing `x` and `y`. If so, remove `steps_per_epoch`. It's not used with those. Sorry, I thought you were using `Dataset`. You don't need that parameter. — Djinn, Jul 23 '22 at 03:33
@Djinn. Aha I see. Thank you very much for all your comments. Now crystal clear. I thank you very much! — Kay, Jul 23 '22 at 03:36
If excluding the parameter doesn't work (cause you mentioned that before), you'll need to set the steps to be the total sample size or 1. But if you're getting that error, are you using arrays or tensors? — Djinn, Jul 23 '22 at 03:53
@Djinn. Yes I'm using np.ndarrays of train_X and train_y. Excluding the param doesn't work because for 32 batches the train.fit() automatically allocates 17 steps per epoch. For 32 batch-size, neither setting the steps to the total sample size nor 1 won't work. However, setting the batch size to 536 without explicitly stating the steps does work. If you don't mind, I'm curious of why excluding the param won't automatically compute proper steps per epoch for training. By the way, without allocating TPU, excluding the param works perfectly for any batch size. — Kay, Jul 23 '22 at 05:47
Excluding it and setting batch size to whatever you want should work with arrays. I think there's a bug somewhere. — Djinn, Jul 23 '22 at 15:16

score 0 · Answer 1 · answered Aug 05 '22 at 10:46

If you do not want to discard any sample of data to train, It's better not to use steps_per_epoch after defining the batch_size. Because the model itself can calculate the steps_per_epoch while training the input dataset based on defined batch_size and provided training samples.

Please go through with this attached gist where I have tried to elaborate these terms in more detail.

What to do if I don't want to waste samples by flooring steps_per_epoch in model.fit?

1 Answers1