Machine Learning Time Series Estimation - Mysterious Phenomenon

Question

I tried fitting a MLP to my time series which contains lots of missing values. The idea was to interpolate the missing values with the trained model.

This approach works remarkably well as you can see in the image Result

However it only works well until around time t=9, when suddenly the model predicts everything as a straight line.

I am sleepless because I cannot explain this phenomenon!

I obviously tried taking different learning rates and optimizers, which result in the same problem however the time from which the model predicts a straight line differs. I also tried changing up my model a little bit, same effect.

Can someone explain what's happening here and why? This is not expected behaviour to me. As a next step, any ideas how I can fix this problem without changing the main approach of fitting a MLP via pytorch?

Feel free to experiment on my jupyter notebook on [https://github.com/dontknowDS/Time-Series-Estimation]

I will also post my code below:

import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import pandas as pd
import numpy as np


learning_rate = 0.01
epochs = 31
Batch_Size = 1
input_size = Batch_Size
hidden_size = 32
output_size = Batch_Size

class model(nn.Module):
    
    def __init__(self):
        super(model, self).__init__()
        self.linear1 = nn.Linear(1, 32)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(32, 32)
        self.relu = nn.ReLU()
        self.linear3 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        x = self.relu(x)
        x = self.linear3(x)
        return x


class TimeSeriesDataset(torch.utils.data.Dataset):

    def __init__(self, csv_file):
        df = pd.read_csv(csv_file)
        df = df[['t', 'x']]    
        df = df.drop(index=df[df.eq('x').any(axis=1)].index) #keep only values which are filled
        df=df.astype('float32')
        self.data = torch.tensor(df.values, dtype=torch.float32)

    def __len__(self):
        return len(self.data)
        

    def __getitem__(self, idx):
        return self.data[idx]


# read data
url = 'https://raw.githubusercontent.com/dontknowDS/Time-Series-Estimation/main/time_series_daten.csv'
dataset = TimeSeriesDataset(url)
trainloader = torch.utils.data.DataLoader(
    dataset, batch_size=Batch_Size, shuffle=True, num_workers=2)


model = model()

optimizer = torch.optim.Adam(model.parameters(), learning_rate)

#TRAINING
for epoch in range(epochs):  
    for i, d in enumerate(trainloader, 0):
       inputs, labels = d[:,0], d[:,1:]
       optimizer.zero_grad()
       outputs = model(inputs)
       loss = F.mse_loss(outputs, labels)
       loss.backward()
       optimizer.step()
    print('epoch: ', epoch)
print('done')


df = pd.read_csv(url)
predictions = []
for i in df['t']:
  predictions.append(model(torch.tensor([i])).detach().numpy()) 
df = df.replace('x', np.nan)
plt.plot(df['t'], predictions, c='red')
plt.scatter(df['t'], pd.to_numeric(df['x']), s=0.5, c='green')
plt.title('result with learning rate ' + str(learning_rate))
plt.show()

Result

You can pull the short ipynb script from here:

https://github.com/dontknowDS/Time-Series-Estimation

Machine Learning Time Series Estimation - Mysterious Phenomenon

0 Answers0