I am trying to find a way to do a cumulative total that accounts for ties in Pandas.
Lets take hypothetical data from a track meet, where I have people, races, heats, and time.
Each person's placement is according to the following:
For a given race/heat combination:
- The person person with the lowest time placed first
- The person with the second lowest time placed second
and so on...
This would be fairly simple code, but for one thing..
If two people have the same time, they both get the same place and then the next time greater than their time will have that value + 1 as the placement.
In the table below, for 100 yard dash, heat 1, RUNNER1 finished first, RUNNER2/RUNNER3 finished second, and RUNNER3 finished third (next time after RUNNER2/RUNNER3)
So basically, the logic is as follows:
If race <> race.shift() or heat <> heat.shift() then place =1
If race = race.shift() and heat = heat.shift() and time>time.shift then place =place.shift()+1
If race = race.shift() and heat = heat.shift() and time>time.shift then place =place.shift()
The part that confuses me is how to handle the ties. Otherwise I could do something like
df['Place']=np.where(
(df['race']==df['race'].shift())
&
(df['heat']==df['heat'].shift()),
df['Place'].shift()+1,
1)
Thank you!
Sample data follows:
Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92
what I want at the end is
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4
[edit] Now, one step further..
Lets assume that once I leave a set of unique values, the next time that set comes up, the values will reset to 1..
So, for example, - Note that it goes to "heat 1" and then "heat 2" and back to "heat 1" - I don't want the rankings to continue from the prior "heat 1", rather I want them to reset.
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2