0

I have a problem that I cannot seem able to solve. I have a dataset with two categorical variables: Gender (Male vs Female) and Smoking status (Smokers vs Non-smokers). The dataset contains 60% Male and 50% of Smokers.

df = pd.DataFrame()
df['Gender'] = ['M','M','M','M','M','M','F','F','F','F']
df['Smoking_status'] = ['S','S','S','S','S','NS','NS','NS','NS','NS']

Is there a way to create a subset such that the new dataset will have 50% Male and 30% Smokers? (it does not matter the percentage of male and smokers since it is an information that I do not have for the final dataset). I am implementing this in python but I will be happy with just an idea of a solution. Thank you all!

Giulia
  • 1

0 Answers0