BiLSTM hidden layers, and memory cells

Question

I have a BiLSTM model, as the following:

tf.keras.models.Sequential([
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(A, return_sequences=True),
                                  input_shape=x),
    tf.keras.layers.Dense(B, activation='tanh'),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(A)),
    tf.keras.layers.Dense(B, activation='tanh'),
    tf.keras.layers.Dropout(0.25),
    tf.keras.layers.Dense(output),
])

If the total parameters = 1 million, what values should A and B be? How many hidden layers should I add to let the model train in a proper way?

I tried the following:

A = 265

B = 64

I used three dense layers, but the forecasting is still weak!

Does the current answer have any merit? Or is it completely bogus? (Not rhetorical questions.) — Peter Mortensen, Dec 31 '22 at 22:08
@PeterMortensen I have no answer yet. I'm asking about NN hyper-parameters to let my model improve the forecast result. — Ahmad Aburoman, Jan 19 '23 at 12:31

score 0 · Answer 1 · edited Dec 31 '22 at 22:05

The LSTM layer is long-short-term memory. It can process input as sequences. You do not need to chop the input into small pieces.

Sample: A single shape and double sharp. You can apply BiDirection or a domain property as well. I use this example as a single trip because of its dimension.

import tensorflow as tf

class MyLSTMLayer( tf.keras.layers.LSTM ):
def __init__(self, units, return_sequences, return_state):
    super(MyLSTMLayer, self).__init__( units, return_sequences=True, return_state=False )
    self.num_units = units

def build(self, input_shape):
    self.kernel = self.add_weight("kernel",
    shape=[int(input_shape[-1]),
    self.num_units])

def call(self, inputs):
    lstm = tf.keras.layers.LSTM(self.num_units)
    return lstm(inputs)


start = 3
limit = 93
delta = 3
sample = tf.range(start, limit, delta)
sample = tf.cast( sample, dtype=tf.float32 )
sample = tf.constant( sample, shape=( 30, 1, 1 ) )
layer = MyLSTMLayer(10, True, True)
layer_2 = MyLSTMLayer(20, True, False)

temp = layer(sample)
print( temp )
temp = tf.expand_dims(temp, -1)
temp = layer_2(temp)
print( temp )

Operation: ( 10, 1, 1 ) x ( 10, 1, 1 )

layer = MyLSTMLayer(10, True, True)
sample = tf.constant( sample, shape=( 10, 1, 1 ) )

Output: (10, 10)

...
  1, 1, 1, 1]], shape=(10, 10), dtype=float32)

Operation: ( 20, 1, 1 ) x ( 10, 1, 1 )

layer = MyLSTMLayer(20, True, True)
sample = tf.constant( sample, shape=( 10, 1, 1 ) )

Output: (20, 10)

...
 1, 1, 1, 1, 1, 1]], shape=(20, 10), dtype=float32)

Operation: ( 30, 1, 1 ) x ( 10, 1, 1 )

layer = MyLSTMLayer(30, True, True)
sample = tf.constant( sample, shape=( 10, 1, 1 ) )

Output: (30, 10)

...
 1, 1, 1, 1, 1, 1]], shape=(30, 10), dtype=float32)

Operation: ( 30, 1, 1 ) x ( 10, 1, 1 )

layer = MyLSTMLayer(10, True, True)
layer_2 = MyLSTMLayer(20, True, False)
sample = tf.constant( sample, shape=( 30, 1, 1 ) )

Output: (30, 20)

...
 1, 1, 1, 1]]], shape=(30, 20), dtype=float32)

Sample: Implementation, Discrete sequence

import tensorflow as tf

class MyLSTMLayer( tf.keras.layers.LSTM ):
    def __init__(self, units, return_sequences, return_state):
        super(MyLSTMLayer, self).__init__( units, return_sequences=True, return_state=False )
        self.num_units = units

    def build(self, input_shape):
        self.kernel = self.add_weight("kernel",
        shape=[int(input_shape[-1]),
        self.num_units])

    def call(self, inputs):
        lstm = tf.keras.layers.LSTM(self.num_units)
        temp = lstm(inputs)
        temp = tf.nn.softmax(temp)
        temp = tf.math.argmax(temp).numpy()
        return temp

sample = tf.constant( [1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], shape=( 10, 1, 1 ) )
layer = MyLSTMLayer(10, True, False)
temp = layer(sample)
print( temp )

Output: As a sequence

[1 0 1 1 1 0 0 0 1 0]

I don't get your POV, also I don't know how to interpret your clarification with my question. Thanks. — Ahmad Aburoman, Dec 05 '22 at 14:27
1. You don't need millions of parameters for LSTM types cells that is a sequence process method. 2. I examined the layer background weights matrix that you added as a series into a model it just all sequences' numbers and shapes. Results is depending on what type of your input and classification. — Jirayu Kaewprateep, Dec 06 '22 at 00:56
I understand, but the input is huge, it should be: 4 weeks of input to predict of 1 week of output, it is a natural phenomenon that I need to predict. with a long time series of timesteps every 30 minutes. — Ahmad Aburoman, Dec 07 '22 at 08:52
Hi, LSTM and Bi-Directional LSTM as you are using in the model is different than Dense layer or convolutions. It had their own forget and memory and gates logics working with a input series that is mean you do not need to have millions parameters of LSTM and it is using a lot of calculation power but effective with time series input. — Jirayu Kaewprateep, Dec 08 '22 at 03:53
What is "long-short-term memory"? It seems to be a contradiction in terms. — Peter Mortensen, Dec 31 '22 at 22:07
@PeterMortensen That's actually just what LSTM stands for. Unfortunately, after that, the post/code just hijacks the question. Note the total lack of `Bidirectional` anywhere in the code. It is only referenced in passing as something the OP "can" do, but never actually shown. — General Grievance, Jan 09 '23 at 21:21
@GeneralGrievance, this code is straightforward and clear. I got it from the official TF website. I'm not posting rubbish. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional — Ahmad Aburoman, Jan 19 '23 at 12:35
@AhmadAburoman Sorry, I didn't mean that *you* were. My comment was meant to address the answer, not your question. Or are you saying that this answer is straightforward and clear? Because based on your comment on your question above, it would seem that this answer is not. — General Grievance, Jan 19 '23 at 13:20
@GeneralGrievance, I'm sorry for misunderstanding you! Yes, I don't get the perfect answer yet. My question was clear, but because the number of the stacked layers that I want to know is not one of the hyper-parameters that I can investigate and solve, such as #neurons, batch size, etc. My apologies! — Ahmad Aburoman, Jan 19 '23 at 15:51

BiLSTM hidden layers, and memory cells

1 Answers1