Restrict the sum of outputs in a neural network regression (Keras)

Question

I'm predicting 7 targets, which is ratio from one value, so for each sample sum of all predicted values should be 1. Except of using softmax at the output (which seems obviously incorrect) I just cant figure out other ways to restrict sum of all predicted outputs to be =1..
Thanks for any suggestuions.

input_x = Input(shape=(input_size,))
output = Dense(512, activation=PReLU())(input_x)
output = Dropout(0.5)(output)
output = Dense(512, activation=PReLU())(output)
output = Dropout(0.5)(output)
output = Dense(16, activation=PReLU())(output)
output = Dropout(0.3)(output)
outputs = Dense(output_size, activation='softmax')(output)
#outputs = [Dense(1, activation=PReLU())(output) for i in range(output_size)] #multioutput nn

nn = Model(inputs=input_x, outputs=outputs)
es = EarlyStopping(monitor='val_loss',min_delta=0,patience=10,verbose=1, mode='auto')
opt=Adam(lr=0.001, decay=1-0.995)
nn.compile(loss='mean_absolute_error', optimizer=opt)
history = nn.fit(X, Y, validation_data = (X_t, Y_t), epochs=100, verbose=1, callbacks=[es])

Example of targets:

So, this is all ratios from one feature, sum for each row =1.
For example Feature - 'Total' =100 points, A=25 points, B=25 points, all others - 10 points. So, my 7 target ratios will be 0.25/0.25/0.1/0.1/0.1/0.1/0.1.

I need to train and predict such ratios, so in future, knowing 'Total' we can restore points from predicted ratios.

Can you actually write how this ratio is constructed and should be predicted? Its not obvious to me why softmax is wrong in this case. — Dr. Snoopy, Jan 31 '20 at 12:16
Softmax can produce those rations, again I don't see why doing that would be wrong. — Dr. Snoopy, Jan 31 '20 at 12:35
Yes, it's producing, but it seems that it's not best option. If I put last layer as multioutput with relu - I get better score on MAE, RMSE. Even though it's not always sum of predicted ratio's =1 and I can't restore actual points from this ratio. — Alex_Y, Jan 31 '20 at 12:47
That it produces different performance does not mean its wrong, I think in the end you do not have a programming problem here. You have to make sure that its predicting correctly before looking at losses and metrics. — Dr. Snoopy, Jan 31 '20 at 12:50

score 2 · Accepted Answer · answered Feb 05 '20 at 22:47

2

I think I understand your motivation, and also why "softmax won't cut it".

This is because softmax doesn't scale linearly, so:

>>> from scipy.special import softmax
>>> softmax([1, 2, 3, 4])
array([0.0320586 , 0.08714432, 0.23688282, 0.64391426])
>>> softmax([1, 2, 3, 4]) * 10
array([0.32058603, 0.87144319, 2.36882818, 6.4391426 ])

Which looks nothing like the original array.

Don't dismiss softmax too easy though - it can handle special situations like negative values, zeros, zero sum of pre-activation signal... But if you want the final regression to be normalized to one, and expect the results to be non-negative, you can simply divide it by the sum:

input_x = Input(shape=(input_size,))
output = Dense(512, activation=PReLU())(input_x)
output = Dropout(0.5)(output)
output = Dense(512, activation=PReLU())(output)
output = Dropout(0.5)(output)
output = Dense(16, activation=PReLU())(output)
output = Dropout(0.3)(output)
outputs = Dense(output_size, activation='relu')(output)
outputs = Lambda(lambda x: x / K.sum(x))(outputs)

nn = Model(inputs=input_x, outputs=outputs)

The Dense layer of course needs a different activation than 'softmax' (relu or even linear is OK).

answered Feb 05 '20 at 22:47

Tomasz Gandor

8,235
2
60
55

Of course, for this architecture to make sense, the training and validation sets (`Y`, `Y_t`) should also have this property - sum of every row should be = 1. – Tomasz Gandor Feb 05 '20 at 22:49
Thanks! I will try and compare both cases. – Alex_Y Feb 07 '20 at 12:31
final regression solution failed - not converging. – Alex_Y Feb 18 '20 at 08:49
2

one disadvantage is that with this Lambda layer, you cannot export the resulting model for other formats, for example porting it to Deep Learning 4 Java. It's a real shame there is no standard "divide by sum" activation in Keras that can be exported as a standard layer. Similar for a few other operations like logarithm and thresholding by percentiles of the data - which all require non-portable Lambdas. – ely Dec 13 '20 at 21:07
Use APIs.. which will avoid the porting constraint – sam Jul 26 '21 at 09:18

Restrict the sum of outputs in a neural network regression (Keras)

1 Answers1