Scikit-learn regression on two variables given a 2D matrix of reference values

Question

I have a matrix of reference values and would like to learn how Scikit-learn can be used to generate a regression model for it. I have done several types of univariate regressions in the past but it's not clear to me how to use two variables in sklearn.

I have two features (A and B) and a table of output values for certain input A/B values. See table and 3D surface below. I'd like to see how I can translate this to a two variable equation that relates the A/B inputs to the single value output, like shown in the table. The relationship looks nonlinear and it could also be quadratic, logarithmic, etc...

How do I use sklearn to perform a nonlinear regression on this tabular data?

A/B 1000    1100    1200    1300    1400    1500    1600    1700    1800    1900    2000
0   8.78    8.21    7.64    7.07    6.50    5.92    5.35    4.78    4.21    3.63    3.06
5   8.06    7.56    7.07    6.58    6.08    5.59    5.10    4.60    4.11    3.62    3.12
10  7.33    6.91    6.50    6.09    5.67    5.26    4.84    4.43    4.01    3.60    3.19
15  6.60    6.27    5.93    5.59    5.26    4.92    4.59    4.25    3.92    3.58    3.25
20  5.87    5.62    5.36    5.10    4.85    4.59    4.33    4.08    3.82    3.57    3.31
25  5.14    4.97    4.79    4.61    4.44    4.26    4.08    3.90    3.73    3.55    3.37
30  4.42    4.32    4.22    4.12    4.02    3.93    3.83    3.73    3.63    3.53    3.43
35  3.80    3.78    3.75    3.72    3.70    3.67    3.64    3.62    3.59    3.56    3.54
40  2.86    2.93    2.99    3.05    3.12    3.18    3.24    3.31    3.37    3.43    3.50
45  2.08    2.24    2.39    2.54    2.70    2.85    3.00    3.16    3.31    3.46    3.62
50  1.64    1.84    2.05    2.26    2.46    2.67    2.88    3.08    3.29    3.50    3.70
55  1.55    1.77    1.98    2.19    2.41    2.62    2.83    3.05    3.26    3.47    3.69
60  2.09    2.22    2.35    2.48    2.61    2.74    2.87    3.00    3.13    3.26    3.39
65  3.12    3.08    3.05    3.02    2.98    2.95    2.92    2.88    2.85    2.82    2.78
70  3.50    3.39    3.28    3.17    3.06    2.95    2.84    2.73    2.62    2.51    2.40
75  3.42    3.32    3.21    3.10    3.00    2.89    2.78    2.68    2.57    2.46    2.36
80  3.68    3.55    3.43    3.31    3.18    3.06    2.94    2.81    2.69    2.57    2.44
85  3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69
90  3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69
95  3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69
100 3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69

Your question makes no sense to me. You have two independent variables A and B, and one dependent variable that I'll call Z. You would have Z = f(A, B) if it were a 3D plot. I can see how you might view it as a series of 1D plots where you hold either A or B constant and fit in terms of the other one. Is that what you're asking for? — duffymo, Sep 29 '21 at 21:06
Yeah it plots as a plane on a 3d plot, can that plot be described with one equation? I've added the image to the top post. Thank you. — ortunoa, Sep 29 '21 at 22:09
What have _you_ tried to solve this? What do you mean by "wouldn't know how to tackle this"? Have you tried learning how to do it? — ForceBru, Sep 29 '21 at 22:19
Sorry, that’s not s plane. It’s a surface with significant non-linear terms. — duffymo, Sep 29 '21 at 22:24
I have solved each of the rows and columns independently and gotten fairly good non linear regressions through them. I wanted to know if there is a way to describe the entire surface with one equation — ortunoa, Sep 30 '21 at 00:16
Showing your code, for even partial problems/solutions/errors/etc., will go a long way to entice others to consider your question(s). — rickhg12hs, Sep 30 '21 at 01:48
You might be able to do a linear regression using a higher order polynomial with interaction terms included. That should be simple enough. Why don't you try it and see? f(x, y) = sum(cij(x^i)(y^j) for i,j = 0 to 5 should be easy enough to do. — duffymo, Oct 08 '21 at 19:37

rickhg12hs · Accepted Answer · 2021-10-04T15:18:11.890

There is probably a succinct nonlinear relationship between your A, B, and table values, but without some knowledge of this system nor with any sophisticated nonlinear modeling, here's a ridiculous model with a decent score.

the_table = """1000    1100    1200    1300    1400    1500    1600    1700    1800    1900    2000
0   8.78    8.21    7.64    7.07    6.50    5.92    5.35    4.78    4.21    3.63    3.06
5   8.06    7.56    7.07    6.58    6.08    5.59    5.10    4.60    4.11    3.62    3.12
10  7.33    6.91    6.50    6.09    5.67    5.26    4.84    4.43    4.01    3.60    3.19
15  6.60    6.27    5.93    5.59    5.26    4.92    4.59    4.25    3.92    3.58    3.25
20  5.87    5.62    5.36    5.10    4.85    4.59    4.33    4.08    3.82    3.57    3.31
25  5.14    4.97    4.79    4.61    4.44    4.26    4.08    3.90    3.73    3.55    3.37
30  4.42    4.32    4.22    4.12    4.02    3.93    3.83    3.73    3.63    3.53    3.43
35  3.80    3.78    3.75    3.72    3.70    3.67    3.64    3.62    3.59    3.56    3.54
40  2.86    2.93    2.99    3.05    3.12    3.18    3.24    3.31    3.37    3.43    3.50
45  2.08    2.24    2.39    2.54    2.70    2.85    3.00    3.16    3.31    3.46    3.62
50  1.64    1.84    2.05    2.26    2.46    2.67    2.88    3.08    3.29    3.50    3.70
55  1.55    1.77    1.98    2.19    2.41    2.62    2.83    3.05    3.26    3.47    3.69
60  2.09    2.22    2.35    2.48    2.61    2.74    2.87    3.00    3.13    3.26    3.39
65  3.12    3.08    3.05    3.02    2.98    2.95    2.92    2.88    2.85    2.82    2.78
70  3.50    3.39    3.28    3.17    3.06    2.95    2.84    2.73    2.62    2.51    2.40
75  3.42    3.32    3.21    3.10    3.00    2.89    2.78    2.68    2.57    2.46    2.36
80  3.68    3.55    3.43    3.31    3.18    3.06    2.94    2.81    2.69    2.57    2.44
85  3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69
90  3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69
95  3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69
100 3.43    3.35    3.28    3.21    3.13    3.06    2.99    2.91    2.84    2.77    2.69"""

import numpy as np
import pandas as pd
import io

df = pd.read_csv(io.StringIO(initial_value=the_table), sep="\s+")
df.columns = df.columns.astype(np.uint64)

df_unstacked = df.unstack()

X = df_unstacked.index.tolist()
y = df_unstacked.to_list()

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(6)
poly_X = poly.fit_transform(X)

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(poly_X,y)
print(model.score(poly_X,y))
# 0.9762180339233807

To predict values given an A and a B using this model, you need to transform the input like was done for creating the model. So for example,

model.predict(poly.transform([(1534,56)]))
# array([2.75275659])

Even more outrageously ridiculous ...

more_X = [(a,b,np.log1p(a),np.log1p(b),np.cos(np.pi*b/100)) for a,b in X]
poly = PolynomialFeatures(5)
poly_X = poly.fit_transform(more_X)
model.fit(poly_X,y)
print(model.score(poly_X,y))
# 0.9982994398684035

... and to predict:

more_X = [(a,b,np.log1p(a),np.log1p(b),np.cos(np.pi*b/100)) for a,b in [(1534,56)]]
model.predict(poly.transform(more_X))
# array([2.74577017])

N.B.: There's probably better ways to Pythonically program these ridiculous models.

How would I be able to run a prediction against this model? I've tried model.predict((1534,56)) it's asking me to run .reshape but then tells me that tuple doesn't have attribute called 'reshape' — ortunoa, Sep 30 '21 at 14:46
@ortunoa The input needs to be transformed like they were for the model. For example, for the first model: `model.predict(poly.transform([(1534,56)]))` -> `array([2.75275659])`. — rickhg12hs, Sep 30 '21 at 15:14
@ortunoa ... and for the second model: `more_X = [(a,b,np.log1p(a),np.log1p(b),np.cos(np.pi*b/100)) for a,b in [(1534,56)]];model.predict(poly.transform(more_X))` -> `array([2.74577017])`. — rickhg12hs, Sep 30 '21 at 15:25
@ortunoa If you were going to do crazy (or not crazy) input transforms, you might be better off organizing them with [sklearn Pipelines](https://scikit-learn.org/stable/modules/compose.html). — rickhg12hs, Sep 30 '21 at 15:31

Scikit-learn regression on two variables given a 2D matrix of reference values

1 Answers1