Pytorch: Why batch is the second dimension in the default LSTM?

Question

In the PyTorch LSTM documentation it is written:

batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

I'm wondering why they chose the default batch dimension as the second one and not the first one. for me, it is easier to imaging my data as [batch, seq, feature] than [seq, batch, feature]. The first one is more intuitive for me and the second one is counterintuitive.

I'm asking here to know if the is any reason behind this and if you can help me to have some understanding about it.

score 4 · Accepted Answer · answered Dec 25 '20 at 20:00

As far as I know, there is not a heavily justified answer. Nowadays, it is different from other frameworks where, as you say, the shape is more intuitive, such as Keras, but only for compatibility reasons with older versions, changing a default parameter that modifies the dimensions of a vector would probably break half of the models out there if their maintainers update to newer PyTorch versions.

Probably the idea, in the beginning, was to set the temporal dimension first to simplify the iterating process over time, so you can just do a

for t, out_t in enumerate(my_tensor)

instead of having to do less visual stuff such as accessing with my_tensor[:, i] and having to iterate in range(time).

score 0 · Answer 2 · answered Jan 05 '21 at 18:57

In an answer to another question it is written:

There is an argument in favor of not using batch_first, which states that the underlying API provided by Nvidia CUDA runs considerably faster using batch as secondary.

But I don't know if this argument is true or not.

Pytorch: Why batch is the second dimension in the default LSTM?

2 Answers2