2

I want to implement a logistic regression with dropout regularization but so far the only working example is the following:

class logit(nn.Module):
    def __init__(self, input_dim = 69, output_dim = 1):
        super(logit, self).__init__()
    
        # Input Layer (69) -> 1
        self.fc1 = nn.Linear(input_dim, input_dim)
        self.fc2 = nn.Linear(input_dim, 1)
 
        self.dp = nn.Dropout(p = 0.2)
      
      
    # Feed Forward Function
    def forward(self, x):
        x = self.fc1(x)
        x = self.dp(x)
        x = torch.sigmoid(self.fc2(x))
        
        return x

Now the problem of setting dropout in between layers is that at the end I do not have a logistic regression anymore (correct me if I'm wrong).

What I would like to do is drop out at the input level.

kmario23
  • 57,311
  • 13
  • 161
  • 150
Marco Repetto
  • 336
  • 2
  • 15

1 Answers1

2

Actually, you still have a logistic regression with the dropout as it is.

The dropout between fc1 and fc2 will drop some (with p=0.2) of the input_dim features produced by fc1, requiring fc2 to be robust to their absence. This fact doesn't change the logit at the output of your model. Moreover, remember that at test time, (usually) the dropout will be disabled.

Note that you could also apply dropout at the input level:

    def forward(self, x):
        x = self.dp(x)
        x = self.fc1(x)
        x = self.dp(x)
        x = torch.sigmoid(self.fc2(x))

In this case, fc1 would have to be robust to the absence of some of the input features.

Berriel
  • 12,659
  • 4
  • 43
  • 67
  • Great response! But the `nn.linear()` will not make a full set of connections with the next layer? Let me rephrase it, will I be able to write the equation of my model as 1/(1+ exp(-(beta*x))) where of course my betas are the element of the last tensor from `model.parameters()`? – Marco Repetto Sep 15 '21 at 12:25
  • @MarcoRepetto yes, you'll. What happens is that at training time some of the xs will be zeroed out (equivalent of dropping a neuron from the previous layer) by the dropout, but all the connections will still exist because dropout is not deterministic, and all those weights will be needed at test time (even if you apply dropout at test time, which is unusual). – Berriel Sep 15 '21 at 12:32