1

Regarding the answer provided by @Shai in LSTM module for Caffe, where caffe.NetSpec() is used to explicitly unroll LSTM units in time for training.

Using this code implementation, why does the "DummyData" layer, or any data layer used instead as input X, appears at the end of the t0 time step, just before "t1/lstm/Mx" in the prototxt file? I don't get it...

A manipulation (cut / paste) is hence needed.

Community
  • 1
  • 1
Florian Mutel
  • 1,044
  • 1
  • 6
  • 13
  • BTW there were some typos in the code I posted. I corrected them, you can check it out again now. – Shai Apr 21 '16 at 04:36
  • 1
    I did not intend to offend you, I am sorry if it can be perceived that way! On the contrary, I thank you for implementing the LSTM, which works pretty well! I used it and obtained good results, I also stacked lstm using these functions – Florian Mutel Apr 21 '16 at 07:21

1 Answers1

0

Shai's NetSpec implementation of LSTM unrolls the net in time. Hence for every time step there is an LSTM unit with shared weights across time steps.
The "bottom" for each unit in time (e.g. t1/lstm/Mx) is a different time step of the input X.

By the way, I suggest you use draw_net.py caffe utility to draw the resulting prototxt and see the flow of data and the temporal repetitions of the unrolled LSTM unit.

Here's how the unrolled net looks like: enter image description here You can see the components of the three LSTM cells, and the different temporal slices of X going to each temporal unrolled LSTM unit.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • My concern is about the .prototxt file generated. Despite that the MemoryData layer is created before the unrolling phase, it appears after the t0 timestep, just before "t1/lstm/Mx". – Florian Mutel Apr 21 '16 at 07:34
  • @FlorianMutel I assumed the data `X` contains all time steps in it, and therefore there is a temporal `"Slice"` layer before the different temporal slices are fed into the LSTM. You can ignore `X` and make `DummyData` for each time step separately. – Shai Apr 21 '16 at 07:39
  • You can think of `Slice1` `Slice2` and `Slice3` as `t0/x`, `t1/x` and `t2/x` respectively. – Shai Apr 21 '16 at 07:47
  • 1
    I was using "bottom = 'nameOfLayer' " and switching to 'ns.layer' it corrected the placement of my dataLayer. Thank you for your time and the draw_net trick – Florian Mutel Apr 21 '16 at 08:07
  • @FlorianMutel glad I could help, and I am happy to learn that this LSTM implementation worked for you. – Shai Apr 21 '16 at 08:08