I have a huge dataset that I need to provide to Keras in the form of a generator because it does not fit into memory. However, using fit_generator
, I cannot replicate the results I get during usual training with model.fit
. Also each epoch lasts considerably longer.
I implemented a minimal example. Maybe someone can show me where the problem is.
import random
import numpy
from keras.layers import Dense
from keras.models import Sequential
random.seed(23465298)
numpy.random.seed(23465298)
no_features = 5
no_examples = 1000
def get_model():
network = Sequential()
network.add(Dense(8, input_dim=no_features, activation='relu'))
network.add(Dense(1, activation='sigmoid'))
network.compile(loss='binary_crossentropy', optimizer='adam')
return network
def get_data():
example_input = [[float(f_i == e_i % no_features) for f_i in range(no_features)] for e_i in range(no_examples)]
example_target = [[float(t_i % 2)] for t_i in range(no_examples)]
return example_input, example_target
def data_gen(all_inputs, all_targets, batch_size=10):
input_batch = numpy.zeros((batch_size, no_features))
target_batch = numpy.zeros((batch_size, 1))
while True:
for example_index, each_example in enumerate(zip(all_inputs, all_targets)):
each_input, each_target = each_example
wrapped = example_index % batch_size
input_batch[wrapped] = each_input
target_batch[wrapped] = each_target
if wrapped == batch_size - 1:
yield input_batch, target_batch
if __name__ == "__main__":
input_data, target_data = get_data()
g = data_gen(input_data, target_data, batch_size=10)
model = get_model()
model.fit(input_data, target_data, epochs=15, batch_size=10) # 15 * (1000 / 10) * 10
# model.fit_generator(g, no_examples // 10, epochs=15) # 15 * (1000 / 10) * 10
On my computer, model.fit
always finishes the 10th epoch with a loss of 0.6939
and after ca. 2-3 seconds.
The method model.fit_generator
, however, runs considerably longer and finishes the last epoch with a different loss (0.6931
).
I don't understand in general why the results in both approaches differ. This might not appear like much of a difference but I need to be sure that the same data with the same net produce the same result, independent from conventional training or using the generator.
Update: @Alex R. provided an answer for part of the original problem (some of the performance issue as well as changing results with each run). As the core problem remains, however, I merely adjusted the question and title accordingly.