0

Q1, I am trying to implement autoencoder, and I have data like this:

  1. 800 300 1 100000 -0.1
  2. 789 400 1.6 100500 -0.4
  3. 804 360 1.2 100420 -0.2

  4. ....

How do I suppose to normalize these data to be able for training?

Q2, Because I don't know the way to do the normalization, so I skip it and just apply the raw data to autoencoder for training, but the gradient become Nan after several iterations, here is the code.

BATCH_SIZE=1
BETA=3
INPUT=89
HIDDEN=64 
EPOCHS=1
LR=0.01
RHO=0.1
raw_data=Loader('test.csv')
print(np.shape(raw_data))
raw_data=torch.Tensor(raw_data)
train_dataset=Data.TensorDataset(data_tensor=raw_data,target_tensor=raw_data)
train_loader=Data.DataLoader(dataset=train_dataset,batch_size=BATCH_SIZE,shuffle=True)

model=SparseAutoEncoder(INPUT,HIDDEN)
optimizer=optim.Adam(model.parameters(),lr=LR)
loss_func=nn.MSELoss()


for epoch in range(EPOCHS):
    for b_index,(x,_) in enumerate(train_loader):

        x=x.view(-1,INPUT)

        x=Variable(x)

        encoded,decoded=model(x)

        loss=loss_func(decoded,x)

        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

    print("Epoch: [%3d], Loss: %.4f" %(epoch + 1, loss.data))

raw_data has the shape of (2700,89) , it contains 89 dimensions in each row, and with different scale of value(as Q1 mentioned).

1 Answers1

0

Get the mean and standard deviation of your data on each dimension (and keep these values), and apply the scale to your data.

When you have new data, reuse these to scale also the new data.

With such variation in scales in your data, you will get a very bad fit (basically the bigger the scale, the better the fit, the smaller, the worst).

Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62