1

I am trying to understand LSTM on Deeplearning4j. I am examining source code for the example, but I can't understand this.

        //Allocate space:
    //Note the order here:
    // dimension 0 = number of examples in minibatch
    // dimension 1 = size of each vector (i.e., number of characters)
    // dimension 2 = length of each time series/example
    INDArray input = Nd4j.zeros(currMinibatchSize,validCharacters.length,exampleLength);
    INDArray labels = Nd4j.zeros(currMinibatchSize,validCharacters.length,exampleLength);

Why do we store 3D array, and what does it mean?

Nueral
  • 21
  • 2
  • What is name of sample file from which you taken your code? – Yuriy Zaletskyy May 16 '16 at 08:50
  • https://github.com/deeplearning4j/dl4j-0.4-examples/blob/master/src/main/java/org/deeplearning4j/examples/recurrent/character/CharacterIterator.java look at next method – Nueral May 16 '16 at 09:33
  • Nueral - please join the Deeplearning4j community on Gitter where they'll answer your question: https://gitter.im/deeplearning4j/deeplearning4j – racknuf May 18 '16 at 05:50
  • @Nueral have you seen my answer? Can you please mark it as answer or comment do you need something more? – Yuriy Zaletskyy May 19 '16 at 15:00

1 Answers1

1

Good question. But that has nothing to do with LSTM functioning, but has deal with task itself. So the task is to forecast, what will be the next character. Forecast of next character has two facets in itself: classification and approximation. If we have deal with approximation only, we can deal only with one dimension array. But if we deal with approximation and classification simultaneously, we can't feed into neural network only normalized ascii representation of characters. We need to transofrm each character into array.

For example a ( a not capital ) will be represented in this way:

1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

b ( not capital ) will be represented as : 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 c will be represented as:

0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Z (z capital !!!! )

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1

so, each character gives us two dimensions array. How all of those dimensions were constructed? Code comment have following explanation:

    // dimension 0 = number of examples in minibatch
    // dimension 1 = size of each vector (i.e., number of characters)
    // dimension 2 = length of each time series/example

I want sincerly commend you for your efforts in understanding how LSTM works, but the code which you pointed gives example which is applicable to all kinds of NN and explains how to work with text data in neural networks, but not explains how LSTM works. You need to see into another part of source code.

Yuriy Zaletskyy
  • 4,983
  • 5
  • 34
  • 54