In the PyTorch LSTM documentation it is written:
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False
I'm wondering why they chose the default batch dimension as the second one and not the first one. for me, it is easier to imaging my data as [batch, seq, feature]
than [seq, batch, feature]
. The first one is more intuitive for me and the second one is counterintuitive.
I'm asking here to know if the is any reason behind this and if you can help me to have some understanding about it.