Let's say I'm predicting weather, and I want to use 7 days of weather data (made up of 5 parameters) to predict the next day's temperature. So each training batch has a sequence of 7 timesteps (7 days of weather data) to make up the X data, i.e.:
[batch_size, 7, 5]
For the Y data I assume I provide just 1 value (the 8th day) for each sequence of the batch, i.e.:
[batch_size, 1]
?