How the function nn.LSTM behaves within the batches/ seq_len?

Question

I’m currently learning to use nn.LSTM with pytorch and had to ask how the function is working.

Basically I’m trying to feed my dataset matrix (M x N). Since the dataset is a matrix, I wanted to feed the dataset recursively(as timesteps) into the LSTM network with Dataloader(utils.data.Dataset).

The point where i got confused was the size of input(seq_len, batch, input_size)

Let’s say I’m getting my data_loader with batch_size=10. In order to generate the train_loader with the right form, I had to make the previous size of (M x N) into the size including the sequence_length which could simply be transformed to (M/seq_len, seq_len, N).

Then the input size of my nn.LSTM would be like: (M/seq_len/batch_size, seq_len, N)

So, my main question comes:

If i feed this data size into the LSTM model nn.LSTM(N, hidden_size), is the LSTM model already doing the recursive feed-forward within the whole batch?
I'm also confused with the seq_len, while seq_len>1, the output will get the dimension of seq_len. Would that mean the output contains the recursive operations of sequences?

I’m not sure i made the questions clear, but my understanding is getting quite messed up..lol Hope somebody could help me organizing the right understanding.

Can you expand on your explanation why you are dividing by the sequence length? This part is absolutely not clear at the moment, but quite crucial for the answer, I believe — dennlinger, Sep 27 '19 at 12:53
I kind of trying to imitate the behavior of seq2seq LSTM network with my dataset (features(;row ) by time step(;column)), dividing the whole matrix with sequence length, and stacking those sequences as much as the batch size.(in each enumerate of a train_loader) — jinujanu, Sep 28 '19 at 13:11

score 1 · Accepted Answer · answered Sep 27 '19 at 18:11

1

Yes, provided each sample's sequence length is the same (which seems to be the case here). If not, you have to pad with torch.nn.utils.rnn.pad_sequence for example.
Yes, LSTM is expanded to each timestep and there is output for each timestep already. Hence you don't have to apply it for each element separately.

answered Sep 27 '19 at 18:11

Szymon Maszke

22,747
4
43
83

1

I'm so sorry for tons of notifications... :( trying to comment with the right form and it's not somehow working. I'll be posting in a minute after getting it right. – jinujanu Sep 28 '19 at 13:32
That's cool, it's better to remove instead of making additional noise + I see you are new around here so take your time. :) – Szymon Maszke Sep 28 '19 at 13:36
So disappointed to myself not able to solve this inline code comment stuffs..lol Anyhow, would those two approaches have the same behavior? `for in range(self.seq_len):``out_seq = self.lstm(input[t].view(1,1,-1), hidd)` with hidd.shape=[1,batch_size,hidden_size]....... and....... `out, hidd = self.lstm(input, hidd)` with hidd.shape=[seq_len, batch_size, hidden_size] – jinujanu Sep 28 '19 at 14:01
Yes, they should. In the first case it still would be `out_seq, hidden = self.lstm(input[t].view(1,1,-1), hidden)` though I think. And you shouldn't use it that way, second one is correct (You almost never have to loop in `pytorch` if correctly done). Furthermore, second argument to `self.lstm` (`hidd` in your case) is not necessary usually, as it's implicitly filled with zeros and this argument is optional (and you probably don't want to touch it in your case). – Szymon Maszke Sep 28 '19 at 14:07
Thank you for the reply:) that kind of sorts the next procedure of concatenating the fully connected layer after the LSTM. But still, if i want to run the training in several epoch, don’t i need the ‘hidd’ to be returned outside the Module?? – jinujanu Sep 28 '19 at 19:50
No, you don't, weights are updated normally even if you pass zeros as starting vector. If samples between consecutive calls are related, you have to pass last `hidden` to the next forward call. But it's rarely the case tbh. – Szymon Maszke Sep 28 '19 at 20:00
Oh.. yeah there were no difference between using the `hidden` recursively and without `hidden`... one last question: If i want to use `nn.Linear` after `nn.LSTM`, it should be the first output return should be fed into the `nn.Linear`, isn't it? – jinujanu Oct 01 '19 at 11:55
It depends, you may want to transform them otherwise, but usually yes. Basically you would have `_, last_hidden, _ = self.lstm(input)` and assuming it's not bidirectional (that would require additional step), you want to get __last layer__ (no matter how many layers you use). So for you linear it would be: `output = self.linear(last_hidden[-1])` – Szymon Maszke Oct 01 '19 at 22:26
I thought if I want the whole sequence information from the `nn.LSTM`, I should use the first output(the first `_` in your explanation)..? if not, what's the purpose of the first output of `nn.LSTM`? – jinujanu Oct 02 '19 at 10:10
Please see [documentation](https://pytorch.org/docs/stable/nn.html#lstm). If you want whole sequence then it's the first output. If you want output from last sequence (as is usually done for basic classification with RNNs), you can use the second argument as I described (or last element of first output, but the intention is not that clear in such case). – Szymon Maszke Oct 02 '19 at 10:17
Exactly, for the seq2seq learning should use the first output, otherwise, the second hidden. Thanks for your time:) I think now i have a better understanding:) – jinujanu Oct 02 '19 at 11:28

How the function nn.LSTM behaves within the batches/ seq_len?

So, my main question comes:

1 Answers1