2

Let's say I have an input dataset of size n = 100 observations, m = 5 features where the last feature is the dependent variable and rest four are independent variable. It is a regression problem, which I intend to solve using a Neural Network. After hyperparameter optimization it came out that this specific problem has a best model, which has 1 hidden layer with 2 neurons and of course an output layer with just 1 neuron (outputting y_hat).

Everything goes well, the model is regularized to avoid overfitting and the prediction error shows satisfactory results too. Also the 10 fold cross-validation score compared to a regular multilinear regression technique is far better. So yeah, I want to keep this model as the best model.

The question is how can we now compute the coefficient values when the dimension of the data when passing through the hidden layer gets reduced.

The solution I came up with was the backward movement and solving the equation from the output layer. That is if C1, B1, B2 are the outputs from the last and hidden layers respectively and

C1 = c_0 + c_1*B1 + c_2*B2;

B1 = b1_0 + b1_1*x1 + b1_2*x2 + b1_3*x3 + b1_4*x4;

B2 = b2_0 + b2_1*x1 + b2_2*x2 + b2_3*x3 + b2_4*x4;

where, c_0, b1_0, b2_0 are intercepts for output neurons and hidden neurons respectively, and

c_1, c_2 are slopes for the output layer equation;

b1_1, b1_2, b1_3, b1_4 are slopes for the first hidden neuron and

b2_1, b2_2, b2_3, b2_4 are slopes for the second hidden neuron.

Now to find the real coefficients of the variables can we substitute values in given form?

Coefficients = c_0 + c_1*(b1_0 + b1_1 + b1_2 + b1_3 + b1_4) + c_2*(b2_0 + b2_1 + b2_2 + b2_3 + b2_4), which when solved gives us:

Coefficients = c_0 + c_1*b1_0 + c_2*b2_0 + c_1*b1_1 + c_1*b1_2 + c_1*b1_3 + c_1*b1_4 + c_2*b2_1 + c_2*b2_2 + c_2*b2_3 + c_2*b2_4

Where:

c_0 + c_1*b1_0 + c_2*b2_0 = intercept of the final equation

c_1*b1_1 + c_2*b2_1 = Coefficient for variable x1

c_1*b1_2 + c_2*b2_2 = Coefficient for variable x2

c_1*b1_3 + c_2*b2_3 = Coefficient for variable x3

c_1*b1_4 + c_2*b2_4 = Coefficient for variable x4

Please tell me if I am right and this makes sense?

Neelabh Pant
  • 807
  • 2
  • 7
  • 12
  • Interesting question. Not sure this approach would work. For example your coefficient for x1 (`c_1*b1_1 + c_2*b1_1`) is missing the influence of `b2_1`. Wouldn't using the chain rule be more appropriate, as youre interested in the influence of `x1` on `y`, but need to account for the paths going through the hidden layer? This question is probably better suited for Cross Validated (https://stats.stackexchange.com/) – Simon Mar 09 '18 at 19:24
  • Hi, that was a typo. I have edited the equation and now it shows the effects of b2 neuron. – Neelabh Pant Mar 10 '18 at 02:29

1 Answers1

1

Not sure I follow... I believe you just want the model's weights?

#this is just because some model's count the input layer and others don't
layerCount = len(model.layers)
lastLayer = layerCount - 1;
hiddenLayer = layerCount -2; 

#getting the weights:
hiddenWeights = model.layers[hiddenLayer].get_weights()
lastWeights = model.layers[lastLayer].get_weights()

Both vars above are lists with [slopes, intercepts], where slopes and intercepts will be numpy arrays.

Taking the values with predictions:

You can also, for a general value instead of values per layers, create these inputs and get their outputs:

testInput = np.array([[0,0,0,0],[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]])

outputs = model.predict(testInput).reshape((5,))

Now you've got:

intercepts = outputs[0]
slope1 = outputs[1] - outputs[0]
slope2 = outputs[2] - outputs[0]
slope3 = outputs[3] - outputs[0]
slope4 = outputs[4] - outputs[0]

Notice that this will only make sense in models that you're sure that a no point you have a multiplication between the vars themselves.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Thanks for the answer but I am sure you did exactly what I did (in terms of extracting the weights) but what I am asking is to calculate the final coefficients of the different features using the chain rule. I wanted to know if the chain rule I applied above make sense and if I should consider the equations to calculate my final coefficients. – Neelabh Pant Mar 10 '18 at 02:33