Why CuDNNLSTM has more parameres than LSTM in keras?

Question

I have been trying to compute number of parameters in LSTM cell in Keras. I created two models one with LSTM and other with CuDNNLSTM.

Partial summary of models are as

CuDNNLSTM Model:

    _________________________________________________________________
    Layer (type)                 Output Shape              Param # 
    =================================================================
    embedding (Embedding)        (None, None, 300)         192000
    _________________________________________________________________
    bidirectional (Bidirectional (None, None, 600)         1444800

LSTM model


    Layer (type)                 Output Shape              Param #
    =================================================================
    embedding_1 (Embedding)      (None, None, 300)         192000    
    _________________________________________________________________  
    bidirectional (Bidirectional (None, None, 600)         1442400

Number of parameters in LSTM is following the formula for lstm parameter computation available all over the internet. However, CuDNNLSTM has 2400 extra parameters.

What is the cause of these extra parameters?

code

    import tensorflow.compat.v1 as tf
    tf.disable_v2_behavior()

    from tensorflow.compat.v1.keras.models import Sequential
    from tensorflow.compat.v1.keras.layers import CuDNNLSTM, Bidirectional, Embedding, LSTM

    model = Sequential()
    model.add(Embedding(640, 300))
    model.add(Bidirectional(<LSTM type>(300, return_sequences=True)))

score 1 · Answer 1 · answered Feb 24 '20 at 21:52

LSTM parameters can be grouped in 3 categories: input weight matrices (W), recurrent weight matrices (R), biases (b). Part of the LSTM cell's computation is W*x + b_i + R*h + b_r where b_i are input biases and b_r are recurrent biases.

If you let b = b_i + b_r, you could rewrite the above expression as W*x + R*h + b. In doing so, you've eliminated the need to keep two separate bias vectors (b_i and b_r) and instead, you only need to store one vector (b).

cuDNN sticks with the original mathematical formulation and stores b_i and b_r separately. Keras does not; it only stores b. That's why cuDNN's LSTM has more parameters than Keras.

Why CuDNNLSTM has more parameres than LSTM in keras?

1 Answers1

Linked