0

I have a Pandas dataframe and am at a bit of a loss with how to do what I am hoping to. This is a snippet of the dataframe, and I am uploading a screenshot as well. Effectively, I would like to create a new column defined as pitches where count is '3--2'.

To do this, I would like to, loop through all rows. For a given row (which I'll refer to as the original row), if prev_count == '3--2' I then want to

  1. step down dataframe rows to where prev_count != '3--2'
  2. confirm that row has the same batter-pitcher identifier as the original row
  3. once in a row that satisfies the conditions prev_count != '3--2' AND batter-pitcher(original row) == batter-pitcher (new row), I would like to extract pitch_number of the new row
  4. then would calculate a value for the new column in the original row using the formula:

pitch_number(original row) + 1 - pitch_number (new row)

To use the existing dataframe as an example... indices 62, 4, 186, 87, and 252 would have a value of 1 for the new column. Index 171 would have a value of 3; 177 a value of 2; and 192 a value of 1. Likewise, 191 would have a value of 5; 229, 10, and 57 would also have values of 1 for this new column variable.

            player_name   batter-pitcher  pitch_number count prev_count
62   Graveman, Kendall  501303---608665             6  3--2       3--1
4          Smyly, Drew  608665---592767             6  3--2       2--2
186  Graveman, Kendall  592696---608665             8  3--2       2--2
87         Maton, Phil  621020---664208             6  3--2       3--1
252      Martin, Chris  514888---455119             6  3--2       2--2
171      Urquidy, José  624585---664353             8  3--2       3--2
177      Urquidy, José  624585---664353             7  3--2       3--2
192      Urquidy, José  624585---664353             6  3--2       3--1
191       García, Yimi  594807---554340            12  3--2       3--2
198       García, Yimi  594807---554340            11  3--2       3--2
209       García, Yimi  594807---554340            10  3--2       3--2
219       García, Yimi  594807---554340             9  3--2       3--2
229       García, Yimi  594807---554340             8  3--2       2--2
10     Valdez, Framber  592696---664285             6  3--2       2--2
57     Valdez, Framber  518692---664285             6  3--2       2--2

I am a bit at a loss as how to 1) loop through rows on a dataframe, and then 2) within each block of the loop, step down rows and 3) reference other columns in the dataframe within another row, so would really appreciate some guidance here. Thanks so much!

cottontail
  • 10,268
  • 18
  • 50
  • 51
David
  • 45
  • 3

1 Answers1

0

For your given dataset, I think this works. But it assumes your pitch counts are always incremented by one and you're not missing any data, otherwise this wouldn't work. I'd suggest looking into cumcount(), cummax(), cummin() grouping on pitcher-batter.

Column 'new1' is the final answer, column 'new' is just an intermediate step.

# get dataframe into right order
df.sort_values(by=['batter-pitcher', 'pitch_number'], ascending=[True, False], inplace=True)


df['new'] = df.groupby(['batter-pitcher', 'prev_count'])['count'].cumcount(ascending=False) + 1

df['new1'] = np.where((df['new']==1) & (df['prev_count']!='3--2'), 1, df['new']+1)

           player_name   batter-pitcher  pitch_number count prev_count  new  new1
62   Graveman, Kendall  501303---608665             6  3--2       3--1    1     1
4          Smyly, Drew  608665---592767             6  3--2       2--2    1     1
186  Graveman, Kendall  592696---608665             8  3--2       2--2    1     1
87         Maton, Phil  621020---664208             6  3--2       3--1    1     1
252      Martin, Chris  514888---455119             6  3--2       2--2    1     1
171      Urquidy, José  624585---664353             8  3--2       3--2    2     3
177      Urquidy, José  624585---664353             7  3--2       3--2    1     2
192      Urquidy, José  624585---664353             6  3--2       3--1    1     1
191       García, Yimi  594807---554340            12  3--2       3--2    4     5
198       García, Yimi  594807---554340            11  3--2       3--2    3     4
209       García, Yimi  594807---554340            10  3--2       3--2    2     3
219       García, Yimi  594807---554340             9  3--2       3--2    1     2
229       García, Yimi  594807---554340             8  3--2       2--2    1     1
10     Valdez, Framber  592696---664285             6  3--2       2--2    1     1
57     Valdez, Framber  518692---664285             6  3--2       2--2    1     1
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14