Scikit-learn: role of weights in Ridge Regression

Question

I am using the library scikit-learn to perform Ridge Regression with weights on individual samples. This can be done by: esimator.fit(X, y, sample_weight=some_array). Intuitively, I expect that larger weights mean larger relevance for the corresponding sample.

However, I tested the method above on the following 2-D example:

    from sklearn import linear_model
    import numpy
    import matplotlib.pyplot as plt

    #Data
    x= numpy.array([[0], [1],[2]])
    y= numpy.array([[0], [2],[2]])
    sample_weight = numpy.array([1,1, 1])
    #Ridge regression
    clf = linear_model.Ridge(alpha = 0.1)
    clf.fit(x, y, sample_weight = sample_weight)
    #Plot
    xp = numpy.linspace(-1,3)
    yp=list()
    for x_i in xp:    
        yp.append(clf.predict(x_i)[0,0])
    plt.plot(xp,yp)
    plt.hold(True)
    x = list(x)
    y = list(y)
    plt.plot(x,y,'or')

I run this code, and I run it again doubling the weight of the first sample:

sample_weight = numpy.array([2,1, 1])

The resulting lines get away from the sample that has larger weight. This is counter-intuitive since I expect that the sample with larger weight has larger relevance.

Am I using wrongly the library, or is it there an error in it?

Have you tried doing the opposite. Maybe the weights are inverted. I've found similar things in the logistic regression class. Try to set it to numpy.array([0.5,1,1]). — Alex S, Jul 15 '13 at 09:05
Thanks, this is what I am planning to do. However, I would like to understand why the weights are inverted. — Marco, Jul 15 '13 at 11:26
Well, same here. The documentation for a lot of methods in sklearn is discouragingly simple. — Alex S, Jul 16 '13 at 13:13

score 2 · Answer 1 · answered Nov 23 '17 at 07:25

The weights are not inverted. Probably you made a stupid mistake, or there was a bug in sklearn which is now fixed. The code

from sklearn import linear_model
import numpy
import matplotlib.pyplot as plt

#Data
x = numpy.array([[0], [1],[2]])
y = numpy.array([[0], [2],[2]])
sample_weight1 = numpy.array([1, 1, 1])
sample_weight2 = numpy.array([2, 1, 1])

#Ridge regressions
clf1 = linear_model.Ridge(alpha = 0.1).fit(x, y, sample_weight = sample_weight1)
clf2 = linear_model.Ridge(alpha = 0.1).fit(x, y, sample_weight = sample_weight2)

#Plot
plt.scatter(x,y)
xp = numpy.linspace(-1,3)
plt.plot(xp,clf1.predict(xp.reshape(-1, 1)))
plt.plot(xp,clf2.predict(xp.reshape(-1, 1)))
plt.legend(['equal weights', 'first obs weights more'])
plt.title('Increasing weight of the first obs moves the line closer to it');

plots me this graph, where the second line (with increased first weight) is closer to the first observation:

Scikit-learn: role of weights in Ridge Regression

1 Answers1