I'd like to reproduce a recurrent neural network where each time layer is followed by a dropout layer, and these dropout layers share their masks. This structure was described in, among others, A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
As far as I understand the code, the recurrent network models implemented in MXNet do not have any dropout layers applied between time layers; the dropout
parameter of functions such as lstm
(R API, Python API) actually defines dropout on the input. Therefore I'd need to reimplement these functions from scratch.
However, the Dropout layer does not seem to take a variable that defines mask as a parameter.
Is it possible to make multiple dropout layers in different places of the computation graph, yet sharing their masks?