1

I fitted my data with a PLS model using sci-kit/python. I noticed that my results with Python 3.7/Sci-kit 0.20.1 are about half of the results with Python 2.7/Sci-kit 0.17. Compared to other codes, it seems that the Python2.7/Sci-kit 0.17 results should be expected. Can anyone help me to understand what I am doing wrong?

The code I used was exactly the same, posted below:

import pandas as pd
import numpy as np
import sklearn
from sklearn.cross_decomposition import PLSRegression
df = pd.read_csv('PSLR.csv', delimiter=';')
y = df['R']
X = df[['A','B','C','D','E','F','G','H']]
pls2 = PLSRegression(n_components=3)
pls2.fit(X, y)
print(pls2.coef_)
y_intercept = pls2.y_mean_ - np.dot(pls2.x_mean_ , pls2.coef_)
print (y_intercept)

The data is:

      R  A  B  C  D  E  F  G  H
0   149  1  0  0  0  0  0  1  0
1    98  0  1  0  0  0  0  1  0
2    72  0  0  1  0  0  0  1  0
3    74  0  0  0  1  0  0  1  0
4   124  1  0  0  0  0  0  0  1
5    71  0  1  0  0  0  0  0  1
6    53  0  0  1  0  0  0  0  1
7    64  0  0  0  1  0  0  0  1
8   186  1  0  0  0  1  1  1  0
9   127  0  1  0  0  1  1  1  0
10  121  0  0  1  0  1  1  1  0
11  104  0  0  0  1  1  1  1  0
12   98  1  0  0  0  0  1  1  1
13   64  0  1  0  0  0  1  1  1
14   38  0  0  1  0  0  1  1  1
15   17  0  0  0  1  0  1  1  1

and the result with Python 3.7/sci-kit 0.20:

[[ 21.31738122]
 [ -0.55514014]
 [ -8.9932702 ]
 [-11.76897088]
 [ 20.21781964]
 [ -5.65972552]
 [ -5.76695658]
 [-18.17454004]]
[102.43789531]

But with Python 2.7/Sci-kit 0.17:

[[ 47.66711352]
 [ -1.24133108]
 [-20.10956351]
 [-26.31621892]
 [ 45.20841908]
 [-10.96001135]
 [-12.89530694]
 [-35.19484545]]
[112.69680383]
Alexander
  • 11
  • 2
  • I do not know `Sci-kit` but I think you can look at the [`Ski-kit API`](https://scikit-learn.org/stable/modules/classes.html). Maybe you can find something that changed in one of the versions that you are using from `Ski-kit`. – TessavWalstijn Dec 03 '18 at 08:37

1 Answers1

0

I found a solution:

The default for the "scale" option of pls has changed: scale=False yields the pre-factors I wanted.

David García Bodego
  • 1,058
  • 3
  • 13
  • 21
Alexander
  • 11
  • 2