1

I'm working on a neural network which approximates a function f(X)=y, with X a vector [x0, .., xn] and y in [-inf, +inf]. This approximated function needs to have an accuracy (sum of errors) around 1e-8. In fact, I need my neural network to overfit.

X is composed of random points in the interval -500 and 500. Before putting these points into the input layer I normalized them between [0, 1].

I use keras as follow:

dimension = 10 #example

self.model = Sequential()
self.model.add(Dense(128, input_shape=(dimension,), init='uniform',  activation='relu'))
self.model.add(Dropout(.2))
self.model.add(Activation("linear"))
self.model.add(Dense(64, init='uniform', activation='relu'))
self.model.add(Activation("linear"))
self.model.add(Dense(64, init='uniform', activation='relu'))
self.model.add(Dense(1))

X_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
y_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))

X_scaled = (X_scaler.fit_transform(train_dataset))
y_scaled = (y_scaler.fit_transform(train_labels))

self.model.compile(loss='mse', optimizer='adam')
self.model.fit(X_scaled, y_scaled, epochs=10000, batch_size=10, verbose=1)

I tried different NN, first [n] -> [2] -> [1] with Relu activation function, then [n] -> [128] -> [64] -> [1]. I tried the SGB Optimizer and I slowly increase the learning rate from 1e-9 to 0.1. I also tried without normalized the data but, in this case, the loss is very high.

My best loss (MSE) is 0.037 with the current setup but i'm far from my goal (1e-8).

First, I would like to know if I did something wrong. I'm in the good way ? If not, how can I reach my goal ?

Thanks you very much


Try #2

I tried this new configuration:

model = Sequential()
model.add(Dense(128, input_shape=(10,), init='uniform', activation='relu'))
model.add(Dropout(.2))
model.add(Dense(64, init='uniform', activation='relu'))
model.add(Dense(64, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))

On a sample of 50 elements, batch_size at 10 and during 100000 epochs. I get a loss around 1e-4.


Try #3

model.add(Dense(128, input_shape=(10,), activation='tanh'))
model.add(Dense(64,  activation='tanh'))
model.add(Dense(1, activation='sigmoid'))

batch_size=1000 epochs=1e5

result: Loss around 1.e-7

PA Masse
  • 21
  • 4
  • Irrelevant to your issue, but your `Activation("linear")` layers make no sense (and they are not actually used as you have placed them, anyway). What *is* relevant is that you probably want to add an `activation='sigmoid'` to your last layer. Omitting the explicit initialization (i.e. leaving the default setting) might also help. – desertnaut Feb 07 '18 at 13:38
  • Thanks @desertnaut, i removed the activation('linear') and add the activation="sigmoid'. I got loss around 1.e-4 which is much better. – PA Masse Feb 07 '18 at 13:50
  • Now try removing the initializers and increasing your batch size – desertnaut Feb 07 '18 at 13:54
  • "Relu" never gets negative results, and most important: it crops results. This may be a huge barrier. The `'tanh'` activations sound better for you (they go from -1 to +1) -- You should not increase the learning rate, that only destrois your training. Using the `'adam'` optimizer is a good option. --- Using a dropout at the inputs sounds wrong, if you want `f(X) = y`, why would you discard X values? – Daniel Möller Feb 07 '18 at 14:16
  • Thanks @DanielMöller, with your comment i'm close to my goal, i get loss around 1.e-7. – PA Masse Feb 07 '18 at 14:26
  • @DanielMöller I can't believe that I missed the dropout myself, but why `tanh`? Both X & y are normalized in [0, 1], so there are no negative values involved – desertnaut Feb 07 '18 at 15:21
  • Ah, ok, I think I misread the normalized part. I would normalize them between -1 and 1, though. – Daniel Möller Feb 07 '18 at 16:23
  • I tried some other combinaisons, tanh and relu as activation function give me closed result (loss around 1e-7). Adding an additionnal hidden layer doesn't help. And when i normalized the input data between -1 and 1 , I got a loss very stable at ~0.6. Thanks to you i'm closed to my goal, but i need to get a loss > 1e-8. I also tried to increase batch_size without success. Do i need to tune the optimizer ? or change another variable ? – PA Masse Feb 07 '18 at 16:37
  • Could you add a histogram of your `y` distribution? – Marcin Możejko Feb 07 '18 at 19:47
  • I'm trying to approximate [objective functions](https://en.wikipedia.org/wiki/Test_functions_for_optimization) for optimization such as Schaffer function, or Rosenbrock function. – PA Masse Feb 08 '18 at 20:14

0 Answers0