Pandas how to divide columns of rows within groupby based on condition

Question

I have the dataframe

C1    c10 val val_type
1      3   5   target
1      3   8   end
1      3   9   other
2      8   1   end
2      8   2   target
2      8   9   other

The values of C1, C10 creates groups of 3. Within these groups I want to create a new column that is target/end. So the output will be:

C1    c10 val val_type   new 
1      3   5   target    0.652
1      3   8   end       0.652
1      3   9   other     0.652
2      8   12  end       0.166
2      8   2   target    0.166
2      8   9   other     0.166

What is the best way to do so?

edit: Ignore other

Do you want `target/end` for each group in `C1` or in `C1` and `C10`? If it's the latter then you have six groups? — CutePoison, Jun 30 '21 at 09:26

score 2 · Answer 1 · answered Jun 30 '21 at 09:38

2

You can pivot it:

s = df.pivot("C1", "val_type", "val")
df["new"] = df["C1"].map(s["target"]/s["end"])
print (df)

   C1  c10  val val_type       new
0   1    3    5   target  0.625000
1   1    3    8      end  0.625000
2   1    5    9    other  0.625000
3   2    8   12      end  0.166667
4   2    8    2   target  0.166667
5   2    8    9    other  0.166667

answered Jun 30 '21 at 09:38

Henry Yik

22,275
4
18
40

I get '{ValueError}Index contains duplicate entries, cannot reshape' for the pivot command, any way to check where exactly are the duplicates? (My df is actually 10K+ rows any there should be duplicated so not sure how to find id) – Cranjis Jun 30 '21 at 09:52

score 0 · Accepted Answer · answered Jun 30 '21 at 09:38

0

We can pivot the dataframe to reshape then using eval calculate target / end, then merge the given df with the evaluated column on C1, c10

c = ['C1', 'c10']
df.merge(df.pivot(c, 'val_type', 'val').eval('target/end').rename('new'), on=c)

   C1  c10  val val_type       new
0   1    3    5   target  0.625000
1   1    3    8      end  0.625000
2   1    3    9    other  0.625000
3   2    8   12      end  0.166667
4   2    8    2   target  0.166667
5   2    8    9    other  0.166667

answered Jun 30 '21 at 09:38

Shubham Sharma

68,127
6
24
53

@Shubam Sharma I get '{ValueError}Index contains duplicate entries, cannot reshape' for the pivot command, any way to check where exactly are the duplicates? (My df is actually 10K+ rows any there should be duplicated so not sure how to find id) – Cranjis Jun 30 '21 at 09:52
@okuoub Check `df[['C1' , 'c10', 'val_type']].duplicated().any()` – Shubham Sharma Jun 30 '21 at 09:55
@okuoub In case you don't care about duplicates or want to drop duplicates then we can use `pivot_table` with `aggfunc='first'`: `df.merge(df.pivot_table('val', c, 'val_type', aggfunc='first').eval('target/end').rename('new'), on=c)` – Shubham Sharma Jun 30 '21 at 10:01

Pandas how to divide columns of rows within groupby based on condition

2 Answers2