I have to create bins based on age. There are some missing values (nan) that need to be changed as "N/A and assign to a new category as "Not_Availabe". I filled in the missing values and then transformed the strings to float.
Students.loc[:, AGE']=Students.loc[:,AGE'].fillna("N/A")
Students.loc[:,AGE'] = Students.loc[:,'AGE'].str.replace('\%', '', regex=True).astype(float)
When I do this I get an error message as "could not convert string to float: 'N/A'. Then, I tried to use pd.cut and assign bins and labels, but nothing work.
If I just do it an error message is "not supported between instances of 'int' and 'str*
Code:
Students.loc[:,'AGE']=Students.loc[:,'AGE'].fillna("unknown")
Students.loc[:,'AGE'] = Students.loc[:,'AGE'].str.replace('\%', '', regex=True).astype(float)
Students.loc[:,'AGE'] = Students.loc[:,'AGE'].cat.add_categories("Not_Available")
Students.loc[:,'AGE']=pd.cut(x=Students.loc[:,'AGE'],bins=[0,18,30,50,65,75,100],labels=["Unknown,"18 and under", "19-30", "31-50", "51-65", "66-75","75+"])
The output should be similar as:
Not_Availabe: 10
18 and under: 16
19-30: 80
31-50: 15
51-65: 5
66-75: 2
75+: 1