So I'm trying to use a modified version of this code
(https://github.com/snf/keras-fractalnet)
that uses Dense layers 512, 128, 32, 8 instead of the convolution layers. The original I believe uses (63,3,3), (128,3,3), (256,3,3),(512,3,3),(512,3,3). Since there are 4 blocks being used with a width of 3, there are a total of 4536 nodes in my version being used if I am correct. Also all layers are using tanh activations except for the last row of the last block.
I'm running on Ubuntu 18.04 with 16GB RAM, but I've noticed that after it (running keras as well as TF2) executes train_on_batch, the available memory goes from 13232 on the free -m command down to 9718 and continues to decrease in similar amounts every first time train_on_batch is called after every 20 episodes.
Now, I'm pretty new to Tensorflow, but dropping around 3GB (if I'm reading that correctly) every time train_on_batch is called seems a bit extreme to me, so I was hoping someone could tell me if my number of nodes seems a bit extreme? Or possibly could point me in the right direction as to things to look for? If you want me to, I can post my codes, but I also tried to use a modified version of pat-coady's TRPO code that uses PyBullet to build the NN which means it is quite lengthy, but if need me to, I can at least share it on github.
Update:
Here is a histogram plot of what the input data looks like.
Update 2:
Thanks to prouast, I have been pointed in the correct direction but I'm still a little confused. While trying to switch to float16 instead of float32, I found out that there were 1000's of new dense layers being created on every single fractal_net call. However, I was only able to see this through warnings that started showing up in my code about float32 and float16 values being used at the same time. So I changed the code to initialize 20 dense layers once and then use them every time fractal_net is called. This seems to somewhat work as it becomes less frequent to loose 2+Gb of RAM on the train_on_batch calls. But it still does here and there.
So my next question is, is there anyway to have the subclassed model report how many dense layers are currently being used and are taking up RAM? I'm going to try to re-create the float16 vs float32 warnings again because I forgot how they were being created but I'd prefer to have a more direct way to see the size of the model.
I have checked the weights before and after the train_on_batch is called, and the weights are not updating as I was afraid.