0

I am currently training a neural network based on simulated data. The standard architecture of the model is as follows:

  1. Input layer (5 features)
  2. Dense hidden layer (64 units)
  3. Dense hidden layer (32 units)
  4. Dense output layer (2 units)

To improve the model's performance, I implemented dropout for the Dense layers and tried different dropout rates. In my case, lower dropout rates (close to 10%) worked best.

Now, I want to implement a Bayesian NN that learns the dropout rate itself, basically doing Variational Dropout. However, edward2's implementation of the DenseVariationalDropout layer does not implement the posterior distribution as a Bernoulli distribution as done in the original dropout method, but (as far as I understand) a Gaussian distribution N(mu, sd).

I already trained the BNN using DenseVariationalDropout layer for the hidden layers (2 and 3) mentioned above. However, when I print the learned weights/parameters of e.g. the first DenseVariationalDropout layer, the learned stddev is negative.

My question now is, how do I interpret the learned stddev parameters of the layer and is there a way to somehow reconstruct the dropout rate?

In the end, I want to compare the learned dropout rate to my previous experiments using the default Dropout layers where I input the dropout rate as a hyperparameter.

Here, you can find the weights/parameter output of the second layer (first hidden layer), only showing the results for the first input node. However, other numbers look identical/similar:.

<tf.Variable 'dense_variational_dropout/kernel/mean:0' shape=(5, 64) dtype=float32, numpy=
array([[ 0.706813  , -0.70008826,  0.6587434 ,  0.6841269 ,  0.7477593 ,
        -0.67054933,  0.6094059 , -0.66187423, -0.72358364, -0.71848536,
        -0.72730917, -0.7543011 , -0.70441514, -0.6896457 ,  0.6773513 ,
        -0.7390508 ,  0.71809   ,  0.70859045, -0.7311058 , -0.6940158 ,
        -0.6691812 ,  0.7317551 , -0.71782017, -0.7040768 ,  0.72268885,
         0.686578  , -0.71496195, -0.74330693,  0.68203586, -0.71308035,
        -0.7066761 , -0.7129957 , -0.7120187 , -0.7271574 , -0.6683003 ,
        -0.69519144,  0.72729176, -0.7753829 ,  0.727453  , -0.7636749 ,
        -0.63404703, -0.69619894,  0.69890517,  0.69997096,  0.7178513 ,
        -0.7171472 ,  0.7051604 ,  0.72245914, -0.75696194,  0.70270175,
        -0.66893655, -0.7003819 ,  0.7011036 , -0.7276705 ,  0.7002035 ,
        -0.7110728 , -0.7156996 , -0.7777113 , -0.7551749 , -0.7739159 ,
         0.68988633, -0.6978364 ,  0.6694619 , -0.6941327 ], ...


<tf.Variable 'dense_variational_dropout/kernel/stddev:0' shape=(5, 64) dtype=float32, numpy=
array([[-4.3544664, -4.369482 , -4.4166455, -4.3739653, -4.290909 ,
        -4.410798 , -4.4918427, -4.408001 , -4.332697 , -4.329703 ,
        -4.3302474, -4.2944455, -4.3497696, -4.376223 , -4.3921137,
        -4.310383 , -4.3301587, -4.3463225, -4.3161445, -4.371665 ,
        -4.3999367, -4.3134384, -4.355694 , -4.3571577, -4.3263173,
        -4.374832 , -4.331994 , -4.308331 , -4.3812027, -4.3389053,
        -4.3488293, -4.3432593, -4.335625 , -4.3250175, -4.410793 ,
        -4.3587666, -4.3182616, -4.269835 , -4.3176365, -4.272719 ,
        -4.4506063, -4.3802133, -4.3598847, -4.35998  , -4.33531  ,
        -4.330978 , -4.3620105, -4.336859 , -4.2905126, -4.3617153,
        -4.403421 , -4.350309 , -4.3518815, -4.3138742, -4.364904 ,
        -4.344313 , -4.329627 , -4.2587113, -4.30761  , -4.2633414,
        -4.369821 , -4.3550806, -4.4168577, -4.3864827], ...

0 Answers0