0

So I'm trying to use a modified version of this code

(https://github.com/snf/keras-fractalnet)

that uses Dense layers 512, 128, 32, 8 instead of the convolution layers. The original I believe uses (63,3,3), (128,3,3), (256,3,3),(512,3,3),(512,3,3). Since there are 4 blocks being used with a width of 3, there are a total of 4536 nodes in my version being used if I am correct. Also all layers are using tanh activations except for the last row of the last block.

I'm running on Ubuntu 18.04 with 16GB RAM, but I've noticed that after it (running keras as well as TF2) executes train_on_batch, the available memory goes from 13232 on the free -m command down to 9718 and continues to decrease in similar amounts every first time train_on_batch is called after every 20 episodes.

Now, I'm pretty new to Tensorflow, but dropping around 3GB (if I'm reading that correctly) every time train_on_batch is called seems a bit extreme to me, so I was hoping someone could tell me if my number of nodes seems a bit extreme? Or possibly could point me in the right direction as to things to look for? If you want me to, I can post my codes, but I also tried to use a modified version of pat-coady's TRPO code that uses PyBullet to build the NN which means it is quite lengthy, but if need me to, I can at least share it on github.

Update: Here is a histogram plot of what the input data looks like.histogram of input data

Update 2:

Thanks to prouast, I have been pointed in the correct direction but I'm still a little confused. While trying to switch to float16 instead of float32, I found out that there were 1000's of new dense layers being created on every single fractal_net call. However, I was only able to see this through warnings that started showing up in my code about float32 and float16 values being used at the same time. So I changed the code to initialize 20 dense layers once and then use them every time fractal_net is called. This seems to somewhat work as it becomes less frequent to loose 2+Gb of RAM on the train_on_batch calls. But it still does here and there.

So my next question is, is there anyway to have the subclassed model report how many dense layers are currently being used and are taking up RAM? I'm going to try to re-create the float16 vs float32 warnings again because I forgot how they were being created but I'd prefer to have a more direct way to see the size of the model.

I have checked the weights before and after the train_on_batch is called, and the weights are not updating as I was afraid.

2 Answers2

0

Dense layers have many more parameters than convolutional layers, because they have connections to every neuron whereas convolutional layer have sparse connectivity.

If you want to reduce the amount of memory used during training, you could try

  • Reducing the batch size
  • Reducing the number of units in dense layers or the number of dense layers
  • Switching back to using convolutional layers
  • Using lower floating point accuracy (e.g., fp16 instead of fp32), but this takes more effort than the other possibilities
prouast
  • 1,176
  • 1
  • 12
  • 17
  • Hey thanks for the quick reply! I actually tested the code by replacing the fractal net stuff with 28 dense layers (512->128->32->8) which should be the same number of parameters (I think) as the fractalnet code with a width of 3 but it used considerably less memory. So there has got to be something in the fractalnet code such that when the NN gets called numerous times, thousands of extra numbers start being generated somehow. And I think the join-layers are the only things that are different than the 28 dense layers I created. So I'm not sure if they could be the source of the issue? – Ryan Maxwell Apr 26 '20 at 05:39
  • 1
    What exactly is your question? Also, it might be easier to answer if you made your code available. – prouast Apr 26 '20 at 11:10
  • Ok I've uploaded the appropriate train, policy, and fractalNN code on this github site: https://github.com/ryanmaxwell96/trpo_fractal1NN_3 The policy code is structured as having classes Policy->TRPO->PolicyNN, LogProb, KLEntropy and the PolicyNN class is the one I have plugged the fractal_net code into. My main question is, why does self.trpo.train_on_batch take so much memory? – Ryan Maxwell Apr 26 '20 at 20:07
  • The fractal code uses "layersizes" to build the fractal structure starting top-down, left-right with 4 Dense layers on the leftmost column, 2 Dense layers on the middle column, and 1 on the rightmost column. – Ryan Maxwell Apr 26 '20 at 20:16
  • Sorry, one more comment. Fractalnet is being called thousands of times and I know that trajectories in train.py does get huge, but it was in pat-coady's code and seemed to work fine, even with 28 layers (512,128,32,8) as I already mentioned. This is his code for which I modified the policy.py function https://github.com/pat-coady/trpo – Ryan Maxwell Apr 26 '20 at 20:21
  • The difference between his code and yours is that you changed the layers from convolutional to dense, correct? – prouast Apr 26 '20 at 22:42
  • Yes that is correct. I'm not sure if it is possible to use convolutional layers when you are not training on images? – Ryan Maxwell Apr 26 '20 at 23:36
  • As I said in my answer, dense layers have many more parameters than convolutional layers and hence need more RAM for each training step. You can use convolutional layers on any data type. What type of data are you using? That would determine whether it is a good idea. – prouast Apr 27 '20 at 04:53
  • I'm using the OpenAI gym type data, which I believe (for the Ant at least) is a (1,29) size vector for the input and an action output space of (1,8) – Ryan Maxwell Apr 27 '20 at 07:24
  • I will see if the convolutional layers can be transformed into this type of data tomorrow (I.e. (512,1,1), (128, 1, 1) etc) – Ryan Maxwell Apr 27 '20 at 07:26
  • Ok I'm having issues implementing the CNN on vector data. At any rate, I've plotted the input data as seen in the update above. Do you still think a CNN would work? – Ryan Maxwell Apr 28 '20 at 21:05
  • Based on what you shared about your data dimensions, you would want to use Conv1D layers. However, I am not familiar with reinforcement learning data and whether this would work well. CNNs are normally used on data that have a spatial structure (images, sound, etc.) – prouast Apr 29 '20 at 00:39
0

I have finally found out the culprit for why memory was being used so drastically. It turns out that the issue was the line of code: tf.random.shuffle(arr, seed)

I'm still a little uncertain as to why this is causing so many problems but I have a hypothesis. My guess is that since the rest of the code is using Keras backend and this part is using Tensorflow directly rather than going through Keras, that causes lots of whacky issues. Maybe someone else would have a better explanation of what happens when you try to use both Tensorflow directly and Keras

Update: It seems that sometimes using the Keras backend also incurs large memory usage penalties. But only for certain commands. For instance, in my code, the K.switch, K.not_equal, K.equal, and K.random_binomial seemed to (less drastically) increase memory usage over time. However, it didn't when I replaced these parts with just numpy commands (it did drop the memory more at the beginning but then would stop eating more memory).

It is strange though because not all the backend commands sapped the memory usage. For example, K.in_train_phase did not seem to affect it much.

One last comment: I'm not sure if this had something to do with it or not, but every time another class was initialized, this might have been a cause as well which is why I removed the unnecessary classes that would get called thousands of times.