3

I want to train an stateful LSTM network using the functional API in Keras.

The fit method is fit_generator.

I am able to train it, using: batch_size = 1

My Input layer is:

Input(shape=(n_history, n_cols),batch_shape=(batch_size, n_history, n_cols), 
    dtype='float32', name='daily_input')

The generator is as follows:

def training_data():
    while 1:       
        for i in range(0,pdf_daily_data.shape[0]-n_history,1):            
            x = f(i)() # f(i) shape is (1, n_history, n_cols)
            y = y(i)
            yield (x,y)

And then the fit is:

model.fit_generator(training_data(),
                    steps_per_epoch=pdf_daily_data.shape[0]//batch_size,...

This works and trains well, however, very slow and performing a gradient update at every time step since batch_size = 1

How, within this configuration, can I set a batch_size > 1 ? remember: the LSTM layer has stateful = True

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
Jose Antonio Martin H
  • 1,453
  • 1
  • 11
  • 10

1 Answers1

1

You will have to modify your generator to yeld the desired number of elements you want your batch to have.

Currently you are iterating over your data element by element (as per your third parameter of range()), obtaining a single x and y, and then yielding that element. As you are returning a single element you are obtaining a batch_size=1, as your fit_generator is training element by element.

Say you want your batch size to be 10, you will then have to slice your data and obtain segments of 10 elements each, and yield those slices instead of single elements. Just be sure that you reflect those changes accordingly on the shape of your Input layers, passing the corresponding batch_size.

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
  • Thanks, but there is something I don't understand. When I add batch_shape=(batch_size, n_history, n_cols) with batch_size > 1 the network grows and does have more parameters, why this ? – Jose Antonio Martin H Jan 22 '18 at 20:23
  • @JoseAntonioMartinH I think that is a different question from the one this post asks... however, how you know your network grows? Without knowing your actual situation, I doubt your network should be increasing. Now it just will do gradient update per batch and not per each input. However, your LSTM layer may indeed change, as it has now a different shape (because of the batch_size increase). Perhaps this is what you are seeing? – DarkCygnus Jan 22 '18 at 20:37
  • Thanks @DarkCygnus, it seems that there are many things about the way stateful LSTMs are implemented that I don't understand. I changed the generator to a normal dataset by generating it in a for loop. Now when tried to do a normal fit, keras told me that the batch_size should be a divisor of the dataset length (think of a dataset length is a prime number of has only few lower integer divisors?). That does not happen in the rest of networks. Seems that stateful LSTMs are hard to use. But I am advancing. Thanks ! – Jose Antonio Martin H Jan 22 '18 at 23:00
  • @JoseAntonioMartinH yeah, perhaps you are correct, as Recurrent NNs are more complex than other variants. Glad I could help. Have you tried calling fit instead? (That is without generator) – DarkCygnus Jan 22 '18 at 23:13
  • Yes, I created a dataset and I am using fit now. But have to set batch_size to a divisor of dataset length which is ugly. And now to make a prediction I have to predict on a dataset which is a multiple of batch_size, which is not the case. There should be a solution to this thing with stateful RNNs. – Jose Antonio Martin H Jan 22 '18 at 23:42