0

Consider toy dataframes df1 and df2, where df2 is a subset of df1 (excludes the first row).

import pandas as pd import numpy as np

df1 = pd.DataFrame({'colA':[3.0,9,45,7],'colB':['A','B','C','D']})
df2 = df1[1:]

Now lets find argmax of colA for each frame

np.argmax(df1.colA) ## result is "2", which is what I expected
np.argmax(df2.colA) ## result is still "2", which is not what I expected.  I expected "1" 

If my matrix of insterest is df2, how do I get around this indexing issue? Is this quirk related to pandas, numpy, or just python memory?

bigO6377
  • 1,256
  • 3
  • 14
  • 28

2 Answers2

1

I think it's due to index. You could use reset_index when you assign df2:

df1 = pd.DataFrame({'colA':[3.0,9,45,7],'colB':['A','B','C','D']})
df2 = df1[1:].reset_index(drop=True)

In [464]: np.argmax(df1.colA)
Out[464]: 2

In [465]: np.argmax(df2.colA)
Out[465]: 1

I think it's better to use method argmax instead of np.argmax:

In [467]: df2.colA.argmax()
Out[467]: 1
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
0

You need to reset the index of df2:

df2.reset_index(inplace=True, drop=True)
np.argmax(df2.colA)
>> 1
DeepSpace
  • 78,697
  • 11
  • 109
  • 154