I'm a python newbie and getting a bit lost in how to transform my data.
Here's an example dataset:
import numpy as np
import pandas as pd
import random
random.seed(123)
df = pd.DataFrame({'pp': list(range(1, 11)), 'age': list(np.random.randint(1,9,10)*10), 'gender': list(np.random.randint(1,3,10)), 'yes/no': list(np.random.randint(0,2,10))})
>>> df
pp age gender yes/no
0 1 20 1 1
1 2 50 1 0
2 3 10 2 1
3 4 50 1 1
4 5 40 2 0
5 6 60 2 0
6 7 30 2 1
7 8 70 1 0
8 9 30 2 0
9 10 70 1 0
I want to create a three new columns within my dataframe which represent the ratio between my different variables, namely:
- ratio between gender 1 and 2 per yes/no category,
- ratio between all existing age groups per yes/no category,
- ratio between age and gender combination per yes/no category
For the first example I got something working like this:
df.groupby(["gender", "yes/no"]).size()/df.groupby(["yes/no"]).size()
But I'd actually want to get the output values as a new column, one value per pp
.
Anyone know a neat way to do this?