-2

I have the dataframe

C1    c10 val val_type
1      3   5   target
1      3   8   end
1      3   9   other
2      8   1   end
2      8   2   target
2      8   9   other

The values of C1, C10 creates groups of 3. Within these groups I want to create a new column that is target/end. So the output will be:

C1    c10 val val_type   new 
1      3   5   target    0.652
1      3   8   end       0.652
1      3   9   other     0.652
2      8   12  end       0.166
2      8   2   target    0.166
2      8   9   other     0.166

What is the best way to do so?

edit: Ignore other

Cranjis
  • 1,590
  • 8
  • 31
  • 64

2 Answers2

2

You can pivot it:

s = df.pivot("C1", "val_type", "val")
df["new"] = df["C1"].map(s["target"]/s["end"])
print (df)

   C1  c10  val val_type       new
0   1    3    5   target  0.625000
1   1    3    8      end  0.625000
2   1    5    9    other  0.625000
3   2    8   12      end  0.166667
4   2    8    2   target  0.166667
5   2    8    9    other  0.166667
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
  • I get '{ValueError}Index contains duplicate entries, cannot reshape' for the pivot command, any way to check where exactly are the duplicates? (My df is actually 10K+ rows any there should be duplicated so not sure how to find id) – Cranjis Jun 30 '21 at 09:52
0

We can pivot the dataframe to reshape then using eval calculate target / end, then merge the given df with the evaluated column on C1, c10

c = ['C1', 'c10']
df.merge(df.pivot(c, 'val_type', 'val').eval('target/end').rename('new'), on=c)

   C1  c10  val val_type       new
0   1    3    5   target  0.625000
1   1    3    8      end  0.625000
2   1    3    9    other  0.625000
3   2    8   12      end  0.166667
4   2    8    2   target  0.166667
5   2    8    9    other  0.166667
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • @Shubam Sharma I get '{ValueError}Index contains duplicate entries, cannot reshape' for the pivot command, any way to check where exactly are the duplicates? (My df is actually 10K+ rows any there should be duplicated so not sure how to find id) – Cranjis Jun 30 '21 at 09:52
  • @okuoub Check `df[['C1' , 'c10', 'val_type']].duplicated().any()` – Shubham Sharma Jun 30 '21 at 09:55
  • @okuoub In case you don't care about duplicates or want to drop duplicates then we can use `pivot_table` with `aggfunc='first'`: `df.merge(df.pivot_table('val', c, 'val_type', aggfunc='first').eval('target/end').rename('new'), on=c)` – Shubham Sharma Jun 30 '21 at 10:01