Normalization of data in continuous neural network training in R

Question

I would like to implement a constant training of my neural network as my input keep coming. However, as I get new data, the normalized values will change over time. Let's say that in time one I get:

df <- "Factor1 Factor2 Factor3 Response
        10      10000   0.4     99
        15      10200   0       88
        11      9200    1       99
        13      10300   0.3     120"
df <- read.table(text=df, header=TRUE)

normalize <- function(x) {
    return ((x - min(x)) / (max(x) - min(x)))
}

dfNorm <- as.data.frame(lapply(df, normalize))

### Keep old normalized values
dfNormOld <- dfNorm 

library(neuralnet)
nn <- neuralnet(Response~Factor1+Factor2+Factor3, data=dfNorm, hidden=c(3,4), 
    linear.output=FALSE, threshold=0.10,  lifesign="full", stepmax=20000)

Then, as time two comes:

df2 <- "Factor1 Factor2 Factor3 Response
        12      10100   0.2     101
        14      10900   -0.7    108
        11      9800    0.8     120
        11      10300   0.3     113"

df2 <- read.table(text=df2, header=TRUE)

### Bind all-time data
df <- rbind(df2, df)

### Normalize all-time data in one shot
dfNorm <- as.data.frame(lapply(df, normalize))

### Continue training the network with most recent data
library(neuralnet)
Wei <- nn$weights
nn <- neuralnet(Response~Factor1+Factor2+Factor3, data=df[1:nrow(df2),], hidden=c(3,4), 
    linear.output=FALSE, threshold=0.10,  lifesign="full", stepmax=20000, startweights = Wei)

This would be how I would train it over time. However, I was wondering if there is any elegant way to decrease this bias of constant training as the normalized values will unavoidably change over time. Here I am assuming that non-normalized values may be biased.

If the non-normalized values are biased, the normalized values will be biased as well. You're not going to remove bias by changing the scale of the values. — De Novo, Mar 07 '18 at 08:30
One solution could be to use generic min and max for each variable and always normalize with those. It could be some value close to what you'd expect to be maximum and minimum measurement (?). Of course, that would depend on the nature of your variables. — Ricardo Fernandes Campos, Mar 08 '18 at 00:16

score 1 · Answer 1 · answered Mar 13 '18 at 09:55

You can use this code:

normalize <- function(x,min1,max1,row1) {
     if(row1>0)
        x[1:row1,] = (x[1:row1,]*(max1-min1))+min1
     return ((x - min(x)) / (max(x) - min(x)))
 }

past_min = rep(0,dim(df)[2])
past_max = rep(0,dim(df)[2])
rowCount = 0

while(1){
df = mapply(normalize, x=df, min1 = past_min, max1 = past_max,row1 = rep(rowCount,dim(df)[2]))
nn <- neuralnet(Response~Factor1+Factor2+Factor3, data=dfNorm, hidden=c(3,4), 
                    linear.output=FALSE, threshold=0.10,  lifesign="full", stepmax=20000)

past_min = as.data.frame(lapply(df, min))
past_max = as.data.frame(lapply(df, max))
rowCount = dim(df)[1]

df2 <- read.table(text=df2, header=TRUE)
df <- rbind(df2, df)
}

Normalization of data in continuous neural network training in R

1 Answers1