0

I have a data frame with an 'education' attribute. Values are discrete, 1-16. For purposes of cross-tabulation, I want to bin this 'education' variable but with custom bins (1:8, 9:11, 12, 13:15, 16).

I've been fooling around with pd.cut() but I get an invalid syntax error

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])

1 Answers1

1

Try making the bins fall between the thresholds:

bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School', 
        'High School Grad', 'Some College', 'College Grad']

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
                                             bins=bins,
                                             labels=labels)

Test:

adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})

Output:

    education         education_bins
0           1  Middle School or less
1           2  Middle School or less
2           3  Middle School or less
3           4  Middle School or less
4           5  Middle School or less
5           6  Middle School or less
6           7  Middle School or less
7           8  Middle School or less
8           9       Some High School
9          10       Some High School
10         11       Some High School
11         12       High School Grad
12         13           Some College
13         14           Some College
14         15           Some College
15         16           College Grad
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74