Create new column with ratio values based on multiple other columns in python pandas

Question

I'm a python newbie and getting a bit lost in how to transform my data.

Here's an example dataset:

import numpy as np
import pandas as pd
import random
random.seed(123)
df = pd.DataFrame({'pp': list(range(1, 11)), 'age': list(np.random.randint(1,9,10)*10), 'gender': list(np.random.randint(1,3,10)), 'yes/no': list(np.random.randint(0,2,10))})

>>> df
   pp  age  gender  yes/no
0   1   20       1       1
1   2   50       1       0
2   3   10       2       1
3   4   50       1       1
4   5   40       2       0
5   6   60       2       0
6   7   30       2       1
7   8   70       1       0
8   9   30       2       0
9  10   70       1       0

I want to create a three new columns within my dataframe which represent the ratio between my different variables, namely:

ratio between gender 1 and 2 per yes/no category,
ratio between all existing age groups per yes/no category,
ratio between age and gender combination per yes/no category

For the first example I got something working like this:

df.groupby(["gender", "yes/no"]).size()/df.groupby(["yes/no"]).size()

But I'd actually want to get the output values as a new column, one value per pp. Anyone know a neat way to do this?

Hamzah · Answer 1 · 2022-03-31T15:35:27.677

1

Try to use this:

(df.groupby(["gender", "yes/no"]).size()/df.groupby(["yes/no"]).size()).rename('ratio').reset_index()

edited Mar 31 '22 at 15:35

answered Mar 31 '22 at 15:10

Hamzah

8,175
3
19
43

Thanks Phoenix, would you also know how to add the ratio per pp as a new column in the original df? – Inkling Apr 04 '22 at 08:38
@Inkling It is the same way as I did change only the gender to pp and rename('ratio') to rename('pp ratio') – Hamzah Apr 11 '22 at 10:28

Create new column with ratio values based on multiple other columns in python pandas

1 Answers1