Exploding gradient Problem with LSTM model build with LSTMCell (PyTorch Implementation)

Question

I'm tried to solve timeseries prediction. Where my input is multivariate. My input has 4 variable, and my target is another variable.

I've processed the data as following. 4 variables and 60 timesteps input sequence. So, each input shape is (1, 240). I'll try to predict the next n-steps future output. During training, it will be 60 steps. So, the target shape is (1,60)

Here is my LSTMPredictor class.

class LSTMPredictor(nn.Module):
    def __init__(self,n_feature, n_hidden=51):
        super(LSTMPredictor, self).__init__()
        self.n_hidden = n_hidden
        # lstm1, lstm2, linear
        self.lstm1 = nn.LSTMCell(n_feature, self.n_hidden)
        self.lstm2 = nn.LSTMCell(self.n_hidden, self.n_hidden)
        self.lstm3 = nn.LSTMCell(self.n_hidden, self.n_hidden)
        self.linear = nn.Linear(self.n_hidden, 1)
    
    def forward(self, x, future=0):
        outputs = []
        # lstm1
        h_t = torch.zeros(1, self.n_hidden, dtype=torch.float32).cuda()
        c_t = torch.zeros(1, self.n_hidden, dtype=torch.float32).cuda()
        # lstm2
        h_t2 = torch.zeros(1, self.n_hidden, dtype=torch.float32).cuda()
        c_t2 = torch.zeros(1, self.n_hidden, dtype=torch.float32).cuda()
        # lstm3
        h_t3 = torch.zeros(1, self.n_hidden, dtype=torch.float32).cuda()
        c_t3 = torch.zeros(1, self.n_hidden, dtype=torch.float32).cuda()


        h_t, c_t = self.lstm1(x, (h_t, c_t))
        h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))

        output = None
        for i in range(future):
            if i == 0:
                # first prediction
                output = self.linear(h_t3) # h_t3?
                outputs.append(output)
                continue
            
            h_t3, c_t3 = self.lstm3(h_t3, (h_t3, c_t3))
            output = self.linear(h_t3)
            outputs.append(output)
        
        output = torch.cat(outputs, dim=1)
        return output

Here, lstm1 and lstm2 receives the input with shape (1, 240), and then lstm3 is used to generate prediction to the future n steps successively. During training it is 60 steps.

However, my model is facing exploding gradient in the first step.

Model Initialization is shown bellow:


n_hidden = 512
n_feature = 240

model = LSTMPredictor(n_feature, n_hidden).to(device)
criterion = nn.MSELoss().to(device)
optimizer = optim.LBFGS(model.parameters(), lr=0.8)

Training Loop:

n_steps = 1
losses = []
print("--- Training Start ---")
for i in tqdm(range(n_steps)):
    print("Step", i)
    for i, sample_i in enumerate(train_input):
        def closure():
            optimizer.zero_grad()
            
            out = model(sample_i.cuda(),future=60)
       
            loss = criterion(out[0], train_target[i].cuda())
            
            losses.append(loss.item())
            loss.backward()

            return loss

        optimizer.step(closure)

        print("loss", losses[-1])

Is there anything wrong in my implementation?

Exploding gradient Problem with LSTM model build with LSTMCell (PyTorch Implementation)

0 Answers0