Pandas for binary classification

Question

I have using Pandas for data processing before training a binary classifier. One of the things I could not find was a function that tells me given a value of a certain feature, let's say Age (people who are for example 60 years old) which percentage of this people are classified as 1 or as 0 (in the binary data column). And this for all different ages in the Age column.

A simple example to illustrate my idea. I have the following DataFrame:

import pandas as pd

data = pd.DataFrame({'Age': [23, 24, 23 ,25 ,24 ,24 ,20], 'label': [0, 1, 1, 0, 1, 1, 0]})

and I want a function that gives me the percentage of people from all ages that are labeled as 0 or as 1. Like so:

   Age   Percentage
0   20     0.0
1   23     0.5
2   24     1.0
3   25     0.0

Is there any function already implementing that? Because I could not find one and I find this a pretty common need for data analysis in binary classification problems.

Thank you!

This is a pure pandas question, and has nothing to do with `machine-learning` or `scikit-learn` - kindly do not spam irrelevant tags (removed). — desertnaut, Aug 18 '20 at 12:40

score 1 · Accepted Answer · answered Aug 18 '20 at 12:40

Just do a groupby mean:

>>> data.groupby('Age').mean()
     label
Age       
20     0.0
23     0.5
24     1.0
25     0.0

Reset the index to get it exactly how you posted your expected output

>>> data.groupby('Age').mean().reset_index()
   Age  label
0   20    0.0
1   23    0.5
2   24    1.0
3   25    0.0

Pandas for binary classification

1 Answers1