How to add sample_weight into a scikit-learn estimator

Question

I have recently developed a scikit-learn estimator (a classifier) and I am now wanting to add sample_weight to the estimator. The reason is so I could apply boosting (ie. Adaboost) to the estimator (as Adaboost requires sample_weight to be present in the estimator).

I had a look at a few different scikit-learn estimators such as linear regression, logistic regression and SVM, but they all seem to have a different way of adding sample_weight into their estimators and it's not very clear to me:

Linear regression: https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/linear_model/_base.py#L375

Logistic regression: https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/linear_model/_logistic.py#L1459

SVM: https://github.com/scikit-learn/scikit-learn/blob/95d4f0841d57e8b5f6b2a570312e9d832e69debc/sklearn/svm/_base.py#L796

So I am confused now and wanting to know how do I add sample_weight into my estimator? Is there a standard way of doing this in scikit-learn or it just depends on the estimator? Any templates or any examples would really be appreciated. Many thanks in advance.

isn't just a list of values ? e.g. [0]*nrow(x) ?! I can make it as an answer ... — Areza, May 12 '20 at 08:00
Thanks @user702846. Am I right to say this list of sample_weight values is to be multiplied with the feature matrix X and target vector y in the `.fit(X, y, sample_weight)` method? — Leockl, May 12 '20 at 09:32
as its name suggest, you put weight on each sample - therefor your list's length must be the same as nrow(x) (the number of samples). [2]*nrow(x) - produces a list with a size of nrow(x) where values are 2 :) — Areza, May 12 '20 at 09:35
Wouldn’t [2]*nrow(x) just be multiplying the values in each row by 2, rather than creating a duplicate 2nd row, which is what sample_weight is suppose to be doing? — Leockl, May 12 '20 at 09:46
first of all - [2]*nrow(x) - just arbirtrary - you make a list of your own based on what you are going to use - here I just used it as an example to satisfy the parameter ! - regarding your python question - no ! it is in bracelets [ ] - so it won't be treated as a numeric, rather a list. so you rather multiplicate a list, rather than a number ! — Areza, May 12 '20 at 09:56
Ok let me check this out and get back to you. This is probably a numpy arrays feature something like vectorisation/broadcasting. — Leockl, May 12 '20 at 10:13
I tried this `np.array([2,2,1,1])*X` and it doesn't work, It just multiplies each row in X by each of the values in the first array, ie. 2 multiplied by the values in the 1st row, 2 multiplied by the values in the 2nd row, 1 multiplied by the values in the 3rd row and 1 multiplied by the values in the 4th row (X here is a feature matrix with 4 rows for this example). — Leockl, May 12 '20 at 10:43
Sorry, I tried `np.array([2,2,1,1]).reshape(-1,1)*X` rather than `np.array([2,2,1,1])*X`. `np.array([2,2,1,1])*X` doesn't work because of mismatch in array shapes/sizes between the 2 arrays — Leockl, May 12 '20 at 10:50
'np.array([2,2,1,1])*X' this is absolutely NOT the right way to do this. As you mention in your question, the approaches to use sample_weight varies a lot, and it depends on the internal implementation details and often there are more than one way to do it. So I recommend sharing those internal details of your estimator — Shihab Shahriar Khan, May 12 '20 at 10:51
@Leockl - why don't you just use [2,2,1,1] then ? again - [2]*nrow was arbitrary - I am sorry for using pythonic expression. — Areza, May 12 '20 at 11:27
Hi @Shihab Shahriar Khan, I think you have replied to one of my questions before (https://stackoverflow.com/questions/61556043/how-to-write-a-scikit-learn-estimator-in-pytorch) but didn't get any replies from you. Anyhow, its the same estimator: github.com/leockl/helstrom-quantum-centroid-classifier — Leockl, May 12 '20 at 11:34
@user702846, I tried [2,2,1,1]*X and it doesn't work with an error of mismatch in array shapes/sizes — Leockl, May 12 '20 at 11:35
Sorry for non-reply, I remember trying to understand that, but it was/is out of my depth — Shihab Shahriar Khan, May 12 '20 at 11:52
@user702846, I don't get it. If I just have a list [2,2,1,1] which is the `sample_weight' for each row in the feature matrix X, how would I then use this list to turn the feature matrix X to have duplicate rows? — Leockl, May 12 '20 at 12:24
@Leockl thanks for the downvote - I don't think in your question you are mentioning you need to duplicate your rows ... do you ? — Areza, May 12 '20 at 13:18

How to add sample_weight into a scikit-learn estimator

0 Answers0