0

I tried to implement a model on keras with GRUs and LSTMs. The model architecture is same for both the implementations. As I read in many blog posts the inference time for GRU is faster compared to LSTM. But in my case the GRU is not faster and infact comparitively slower with respect to LSTMs. Can anybody find a reason for this. Is there anything to do with GRU's in Keras or am I going wrong anywhere.

A small help is highly appreciated...

Thanks in Advance

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59
venkatesh
  • 160
  • 1
  • 2
  • 5
  • Could you share a concrete example of how you instantiate the models? GRU is theoretically faster but a mis-configuration can give the opposite result. You can always check the RNN layers implementation in keras code to better undestand: https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L239 – Mohamed Ali JAMAOUI Jan 27 '20 at 14:45
  • What actually you mean by mis-configuration. Can you please explain it in a better detailed way. Eventhough I gone through the source code in keras, I haven't understood the things which are going in a wrong way here. – venkatesh Jan 28 '20 at 05:20

2 Answers2

0

I would first check to see if the LSTM that you use is CuDNNLSTM or simple LSTM. The former is a variant which is GPU-accelerated, and runs much faster than the simple LSTM, though the training, say, runs on GPU in both cases.

Yes, the papers do not lie; in fact, there are fewer computations for a GRU-cell than an LSTM-cell.

Ensure that you do not compare simple GRU with CuDNN-LSTM.

For a true benchmark, ensure that you compare LSTM with GRU and CuDNNLSTM with CuDNNGRU.

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59
  • Though I am using Simple LSTM and GRU (**not any cuda variants**), The compuatational time is slower in GRUs when compared with LSTMs. – venkatesh Jan 28 '20 at 05:18
0

LSTM (Long Short Term Memory): LSTM has three gates (input, output and forget gate)

GRU (Gated Recurring Units): GRU has two gates (reset and update gate).

GRU use less training parameters and therefore use less memory, execute faster and train faster than LSTM's whereas LSTM is more accurate on datasets using longer sequence. In short, if sequence is large or accuracy is very critical, please go for LSTM whereas for less memory consumption and faster operation go for GRU. It all depends on your training time and accuracy trade off.

If in your case both the architectures are same, there might be an issue with the batch size for both the models. Make sure that the batch size and sequence length are also same for both the models.

Mousam Singh
  • 675
  • 2
  • 9
  • 29