2

I have noticed some inconsistent behaviour in how axes are defined (as discussed in in this forum: What does axis in pandas mean?), and I just would like to make sure I understand correctly that:

  1. For dropna axis=0 means: the algorithm goes through each row and checks for na's, if there are any/all it deletes that row.

  2. For any axis=0 means: the algorithm goes through each column and checks for condition True, if there are any/all it deletes that column

So the definition of row and column are switched (although the pandas doc explains 'any' axis as {index (0), columns (1)}.

Is this correct?

Niccola Tartaglia
  • 1,537
  • 2
  • 26
  • 40

1 Answers1

1

Axis zero is the index. Axis one are the columns. That’s it.

The interpretation of why the different axis choices behave the way they do is confusing. It is my belief that it is consistent though.

For dropna it refers to the axis from which keys will be dropped.

For any, sum, mean, and many more, it refers to the axis over which we will evaluate the reduction function.

For apply it refers to the axis that is used in each of the series objects that get passed to the function being applied.

For add, mul, etc. it refers to the axis that is used as a reference when adding a series to a dataframe.

You can make arguments why you may have made different choices. But I think the developers made good choices. If something specific confuses you, ask a question.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thank you for clarifying. I agree. I might simply make myself a drawing/cheat sheet to help me remember!! – Niccola Tartaglia Dec 17 '17 at 00:51
  • I would argue that you could make it consistent this way: You could establish a rule across all functions that if axis=0, you go through the elements of each row at a time and do whatever the function does to that row (e.g. compute mean, filter NANs etc.), then move on to the next row, same for axis=1 with each column. – Niccola Tartaglia Dec 17 '17 at 00:56