Subset with fixed proportion of categorical variables

Asked Sep 29 '22 at 18:56

Active Sep 29 '22 at 18:56

Viewed 35 times

I have a problem that I cannot seem able to solve. I have a dataset with two categorical variables: Gender (Male vs Female) and Smoking status (Smokers vs Non-smokers). The dataset contains 60% Male and 50% of Smokers.

df = pd.DataFrame()
df['Gender'] = ['M','M','M','M','M','M','F','F','F','F']
df['Smoking_status'] = ['S','S','S','S','S','NS','NS','NS','NS','NS']

Is there a way to create a subset such that the new dataset will have 50% Male and 30% Smokers? (it does not matter the percentage of male and smokers since it is an information that I do not have for the final dataset). I am implementing this in python but I will be happy with just an idea of a solution. Thank you all!

asked Sep 29 '22 at 18:56

Giulia

Please provide enough code so others can better understand or reproduce the problem. – Community Sep 29 '22 at 20:10

Subset with fixed proportion of categorical variables

0 Answers0