I am currently training a neural network based on simulated data. The standard architecture of the model is as follows:
- Input layer (5 features)
- Dense hidden layer (64 units)
- Dense hidden layer (32 units)
- Dense output layer (2 units)
To improve the model's performance, I implemented dropout for the Dense layers and tried different dropout rates. In my case, lower dropout rates (close to 10%) worked best.
Now, I want to implement a Bayesian NN that learns the dropout rate itself, basically doing Variational Dropout. However, edward2
's implementation of the DenseVariationalDropout
layer does not implement the posterior distribution as a Bernoulli distribution as done in the original dropout method, but (as far as I understand) a Gaussian distribution N(mu, sd).
I already trained the BNN using DenseVariationalDropout
layer for the hidden layers (2 and 3) mentioned above.
However, when I print the learned weights/parameters of e.g. the first DenseVariationalDropout
layer, the learned stddev
is negative.
My question now is, how do I interpret the learned stddev
parameters of the layer and is there a way to somehow reconstruct the dropout rate?
In the end, I want to compare the learned dropout rate to my previous experiments using the default Dropout layers where I input the dropout rate as a hyperparameter.
Here, you can find the weights/parameter output of the second layer (first hidden layer), only showing the results for the first input node. However, other numbers look identical/similar:.
<tf.Variable 'dense_variational_dropout/kernel/mean:0' shape=(5, 64) dtype=float32, numpy=
array([[ 0.706813 , -0.70008826, 0.6587434 , 0.6841269 , 0.7477593 ,
-0.67054933, 0.6094059 , -0.66187423, -0.72358364, -0.71848536,
-0.72730917, -0.7543011 , -0.70441514, -0.6896457 , 0.6773513 ,
-0.7390508 , 0.71809 , 0.70859045, -0.7311058 , -0.6940158 ,
-0.6691812 , 0.7317551 , -0.71782017, -0.7040768 , 0.72268885,
0.686578 , -0.71496195, -0.74330693, 0.68203586, -0.71308035,
-0.7066761 , -0.7129957 , -0.7120187 , -0.7271574 , -0.6683003 ,
-0.69519144, 0.72729176, -0.7753829 , 0.727453 , -0.7636749 ,
-0.63404703, -0.69619894, 0.69890517, 0.69997096, 0.7178513 ,
-0.7171472 , 0.7051604 , 0.72245914, -0.75696194, 0.70270175,
-0.66893655, -0.7003819 , 0.7011036 , -0.7276705 , 0.7002035 ,
-0.7110728 , -0.7156996 , -0.7777113 , -0.7551749 , -0.7739159 ,
0.68988633, -0.6978364 , 0.6694619 , -0.6941327 ], ...
<tf.Variable 'dense_variational_dropout/kernel/stddev:0' shape=(5, 64) dtype=float32, numpy=
array([[-4.3544664, -4.369482 , -4.4166455, -4.3739653, -4.290909 ,
-4.410798 , -4.4918427, -4.408001 , -4.332697 , -4.329703 ,
-4.3302474, -4.2944455, -4.3497696, -4.376223 , -4.3921137,
-4.310383 , -4.3301587, -4.3463225, -4.3161445, -4.371665 ,
-4.3999367, -4.3134384, -4.355694 , -4.3571577, -4.3263173,
-4.374832 , -4.331994 , -4.308331 , -4.3812027, -4.3389053,
-4.3488293, -4.3432593, -4.335625 , -4.3250175, -4.410793 ,
-4.3587666, -4.3182616, -4.269835 , -4.3176365, -4.272719 ,
-4.4506063, -4.3802133, -4.3598847, -4.35998 , -4.33531 ,
-4.330978 , -4.3620105, -4.336859 , -4.2905126, -4.3617153,
-4.403421 , -4.350309 , -4.3518815, -4.3138742, -4.364904 ,
-4.344313 , -4.329627 , -4.2587113, -4.30761 , -4.2633414,
-4.369821 , -4.3550806, -4.4168577, -4.3864827], ...