-1

Good morning all. I want to create a binning column in my main dataframe using data from a second one. Dataframe#1 has "Runner ID" and "Cumulative Distance" columns. Dataframe#2 has "Runner ID", "Section Start" and "Section Name" columns I'm trying to create a third column on Dataframe #1 named "Section Name Binning" based on matching "Runner ID" in both dataframes, and then binning "Cumulative Distance" from Dataframe#1 using the data from columns "Section Start" and "Section Name" from Dataframe#2. "Cumulative Distance" from Dataframe#1 and "Section Start" from Dataframe#2 will always be in increasing order and they restart when "Runner ID" changes. Attached a picture and Dataframes samples. As always appreciating your support.

Dataframes for binning Dataframes for binning

df1=pd.DataFrame({'Runner_ID':['John','John','John','John','John','John','John','John','John','John','John','Jen','Jen','Jen','Jen','Jen','Jen','Jen','Jen','Jen','Jen','Jen'],'Cumulative_Distance':[1,1.5,1.8,3,3.2,3.7,4,4.3,5,6.6,8,2,2.3,2.8,3.2,3.5,3.9,4.8,5,5.3,5.8,6]})

df2=pd.DataFrame({'Runner_ID':['John','John','John','Jen','Jen','Jen','Jen'],'Section_Start':[0,3,5,0,2.5,3.5,5], 'Section_Name':['Flats', 'Uphill', 'Downhill', 'Flats', 'Uphill','Curve', 'Downhill']})
Corralien
  • 109,409
  • 8
  • 28
  • 52
GusRo
  • 7
  • 4

1 Answers1

-1

This is pd.merge_asof:

(pd.merge_asof(df1.sort_values('Cumulative_Distance'),df2.sort_values('Section_Start'), 
               left_on='Cumulative_Distance', right_on='Section_Start',
               by='Runner_ID', allow_exact_matches=False)
   .sort_values(['Runner_ID','Cumulative_Distance'])
)

Output:

   Runner_ID  Cumulative_Distance  Section_Start Section_Name
3        Jen                  2.0            0.0        Flats
4        Jen                  2.3            0.0        Flats
5        Jen                  2.8            2.5       Uphill
8        Jen                  3.2            2.5       Uphill
9        Jen                  3.5            2.5       Uphill
11       Jen                  3.9            3.5        Curve
14       Jen                  4.8            3.5        Curve
15       Jen                  5.0            3.5        Curve
17       Jen                  5.3            5.0     Downhill
18       Jen                  5.8            5.0     Downhill
19       Jen                  6.0            5.0     Downhill
0       John                  1.0            0.0        Flats
1       John                  1.5            0.0        Flats
2       John                  1.8            0.0        Flats
6       John                  3.0            0.0        Flats
7       John                  3.2            3.0       Uphill
10      John                  3.7            3.0       Uphill
12      John                  4.0            3.0       Uphill
13      John                  4.3            3.0       Uphill
16      John                  5.0            3.0       Uphill
20      John                  6.6            5.0     Downhill
21      John                  8.0            5.0     Downhill
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thank you very much. wasn't familiar with this function. Is it efficient with very large datasets? – GusRo Dec 16 '21 at 16:14
  • @GusRo Pandas is *not* efficient with *very large* datasets. That being said, this function is as efficient as it can be, AFAICT. – Quang Hoang Dec 16 '21 at 16:16