sklearn LabelEncoder to combine multiple values into a single label

Question

I am looking to run classification on a column that has few possible values, but i want to consolidate them into fewer labels.

for example, a job may have multiple end states: success, fail, error, killed. but i am looking to classify the jobs into either a group of end states (which would include error and killed) and another group (which will only include success and fail).

I cannot find a way to do that in sklearn's LabelEncoder, and other than manually changing the target column myself (by assigning 1 to success or fail and 0 to everything else) i cannot find a way.

EDIT example. this is what i need to happen:

>>> label_binarize(['success','fail','error','killed', 'success'], classes=(['success', 'fail']))
array([[1],
       [1],
       [0],
       [0],
       [1]])

unfortunately, label_binarize (or LabelBinarizer, for that matter) does it for each column separately. THIS IS NOT WHAT I WANT:

>>> label_binarize(['success','fail','error','killed', 'success'], classes=['success', 'fail'])
array([[1, 0],
       [0, 1],
       [0, 0],
       [0, 0],
       [1, 0]])

any ideas on how to do that?

score 2 · Accepted Answer · answered Apr 05 '22 at 01:42

2

Maybe you should check out label_binarize. You could set the success as the only class, thereby defaulting the rest to 0. Same result as changing the data prior to encoding, but might fit better into your pipeline.

from sklearn.preprocessing import label_binarize
label_binarize(['success','fail','error','killed', 'success'], classes=['success'])

Output

array([[1],
       [0],
       [0],
       [0],
       [1]])

answered Apr 05 '22 at 01:42

Chris

15,819
3
24
37

that is a good option, and i have not defined the problem well enough... i need `classes` to be a list of labels, and still get a 1d array for the target. edited the question with an example. – Ehud Kaldor Apr 05 '22 at 22:47
So just take the max of each array – Chris Apr 05 '22 at 23:38

sklearn LabelEncoder to combine multiple values into a single label

1 Answers1