0

I'm trying to implement a neural network that takes the input of musical note/pitch on one axis and octave of that note on the other axis.

The input is supposed to go through a convolution layer (Conv2DLayer). After convolution, the outputs should go through an LSTM layer.

Input -> Convolution and pooling layers -> LSTM layers -> Output

The problem is that LSTM layers and Convolution layers have a specific input shape

Conv2DLayer expected input shape: (batch_size, num_channels, rows, columns) LSTMLayer expected input shape: (batch_size, sequence_len, num_inputs)

How can I take an input of shape (batch_size, sequence_len, num_channels, rows, columns) or similar and build such a network? If I reshape and flatten the shape by removing sequence_len then either rows or columns would have to change and the shape will be distorted.

Saif Ur Rehman
  • 338
  • 3
  • 14
  • 1
    how do you expect LSTM to analyze 2D input? Figure out what you want to achieve, and then the code will be natural. There are many ways of what you might try to achieve here, thus many possible modifications – lejlot Jun 09 '16 at 22:03
  • @lejlot I'm planning to down sample it to a small size then pass it to LSTM to have some long term memory, I think it should be able to learn from that, what do you think?. Anyway I found the answer, it is to use the cuDNN specific Conv 3D layer in Lasagne – Saif Ur Rehman Jun 09 '16 at 23:11

0 Answers0