3

I'm very new to programming (<2 weeks), and am having to learn Python3 as part of a course, so please bear that in mind in any answers! I'm working on a new Mac, in case that makes a difference.

I've got a table of data, which was taken from a csv file and compiled into 3 columns, with several thousand rows. I've filtered out quite a lot of the rows based on certain conditions, leaving me with about 200 rows (but whose index is from about 8000-8300, because of the initial ordering). Now I'm trying to find the time (one of the columns) at which the highest value in another of the columns occurred. When I run the code below, it gives me an error "index out of bounds". I've read another question here on the same error message, but I didn't really understand how the answer could be applied here.

maxrow=df['A'].idxmax()
maxA=df['A'].irow(maxrow)
maxtime = df['time'].irow(maxrow)
maxB = df['B'].irow(maxrow)

I understand that the first line is finding the row in which A is at a maximum and assigning a variable "maxrow" to have that row number. The second line is creating a variable maxA and assigning to it the value found in column A at its maximum row. At this stage, though, it seems to create a problem. I should mention that if I insert a row number of less than 200 instead of "maxrow" in the 3rd, 4th and 5th lines, there is no problem at all (except that it's not the right row chosen).

So I think somehow the program is identifying the max row based on its index number, but then when it comes to use it, it is using the actual new ordering of the rows, of which there aren't enough.

Can anyone help? Thanks

Barmar
  • 741,623
  • 53
  • 500
  • 612
Tom
  • 109
  • 1
  • 9
  • 1
    Needs more [mcve]. – melpomene Feb 04 '17 at 10:52
  • `.irow(maxrow)` is deprecated, you should use `.iloc[maxrow]` – Barmar Feb 04 '17 at 11:04
  • Does `df['A'].loc(maxrow)` work? `iloc()` is integer-based, `loc()` is label-based. – Barmar Feb 04 '17 at 11:08
  • Thanks! That almost works. It doesn't throw up any error messages, but now when I print out the values for "maxrow" and "maxtime" it gives me "" instead of the value. – Tom Feb 04 '17 at 15:05
  • By the way, that is irrelevant of whether I use loc() or iloc(). I would be interested in knowing what "deprecated" means, in a programming context. – Tom Feb 04 '17 at 15:12

1 Answers1

0

This should solve it:

maxrow = df['A'].idxmax()
maxA = df['A'].loc[maxrow]
maxtime = df['time'].loc[maxrow]
maxB = df['B'].loc[maxrow]

A more correct use of loc:

maxrow = df['A'].idxmax()
maxA = df.loc[maxrow, 'A']
maxtime = df.loc[maxrow, 'time']
maxB = df.loc[maxrow, 'B']

An even more correct use, with a single call to loc:

maxrow = df['A'].idxmax()
maxA, maxtime, maxB = df.loc[maxrow, ['A', 'time', 'B']]

A few notes regarding the comments above:

  • loc[] should be used with square brackets, rather than round ones. This explains the pointer you got: <pandas.core.indexing._iLocIndexer object at 0x1179d5978>.
  • You should use loc for this purpose, rather than iloc, since idxmax returns an index that corresponds to the index of your dataframe. In your case, you probably have an incremental index, so it doesn't matter. I suggest reading the docs (loc, iloc) to understand the difference.
  • Regrading deprecation, see this question: "deprecation is a status applied to software features to indicate that they should be avoided".
Shovalt
  • 6,407
  • 2
  • 36
  • 51
  • Please can you elaborate your second bullet point explanation. If idxmax returns an index, and iloc expects an index, why does the op have to use loc, which expects a label? Also, why does iloc not work? – CodeCabbie Nov 23 '22 at 15:11
  • 1
    A dataframe can have either an incremental, numeric index, i.e. [0, 1, 2, 3...], or a custom index, e.g. ['cat', 'dog', 'cow', 'mouse']. For the first, loc/iloc can be used interchangeably, but for the second you would use either `loc['cow']` or `iloc[2]`. Respectively, `argmax` would return 2 (assuming that is where the largest value is), and `idxmax` would return 'cow'. Since the OP used `idxmax`, the more correct usage would be `loc`. – Shovalt Nov 24 '22 at 14:56
  • So idxmax returns a label, not a numeric index. Therefore its result should be used with loc (not iloc) since loc accepts a label. Is that correct? – CodeCabbie Nov 25 '22 at 20:06
  • 1
    @CodeCabbie, yes, you are correct. – Shovalt Nov 26 '22 at 21:08