I downloaded a dataset in .csv format from kaggle which is about lego. There's a "Ages" column like this:
df['Ages'].unique()
array(['6-12', '12+', '7-12', '10+', '5-12', '8-12', '4-7', '4-99', '4+',
'9-12', '16+', '14+', '9-14', '7-14', '8-14', '6+', '2-5', '1½-3',
'1½-5', '9+', '5-8', '10-21', '8+', '6-14', '5+', '10-16', '10-14',
'11-16', '12-16', '9-16', '7+'], dtype=object)
These categories are the suggested ages for using and playing with the legos. I'm intended to do some statistical analysis with these age bins. For example, I want to check the mean of these suggested ages. However, since the type of each of them is string:
type(lego_dataset.loc[0]['Ages'])
str
I don't know how to work on the data.
I've already check How to categorize a range of values in Pandas DataFrame But imagine there are 100 unique bins. It's not reasonable to prepare a list of 100 labels for each category. There should be a better way.