68

Pandas throws a Future Warning when I apply a function to multiple columns of a groupby object. It suggests to use a list as index instead of tuples. How would one go about this?

>>> df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
>>> df.groupby([0,1])[1,2].apply(sum)
<stdin>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
     1  2
0 1      
1 2  2  3
4 5  5  6
7 8  8  9
cmosig
  • 1,187
  • 1
  • 9
  • 24
  • 18
    `[[1, 2]]`. 2 Brackets is how you do DataFrame selection (i.e. selection with a list). I'm suprirsed [1,2] has worked all this time. – ALollz Apr 02 '20 at 19:44
  • Thanks, you are right! Maybe this should have thrown a Keyerror. – cmosig Apr 02 '20 at 19:47
  • 4
    `[[1,2]].sum()`. No need to `apply` the built-in python's `sum` function – rafaelc Apr 02 '20 at 20:02
  • Correct, but sum was just an example to visualize my problem. – cmosig Apr 02 '20 at 20:04
  • 1
    The decision was made https://github.com/pandas-dev/pandas/issues/23566. To keep compatibility between 0.25 and 1.0 they didn't remove the feature but added a warning in 1.0. Likely it will be removed in the next major deprecation cycle. – ALollz Apr 02 '20 at 20:09

2 Answers2

73

This warning was introduced in pandas 1.0.0, following a discussion on GitHub. So best use what was suggested there:

df.groupby([0, 1])[[1, 2]].apply(sum)

It's also possible to move the slicing operation to the end, but that is not as efficient:

df.groupby([0, 1]).apply(sum).loc[:, 1:]

Thanks @ALollz and @cmosig for helpful comments.

Arne
  • 9,990
  • 2
  • 18
  • 28
  • 1
    You should have integrated the comment of @ALollz into your answer, not just referring to it. That is why the other answer gets upvoted, even though you had the answer at hand much earlier. You simply need to explain why the double brackets are needed - and comments do not belong to an answer. – questionto42 Dec 01 '20 at 00:45
30

Use double brackets after the groupby method. Single brackets are used to output a Pandas Series and double brackets are used to output a Pandas DataFrame.

df.groupby([0,1])[[1,2]].apply(sum)
PigSpider
  • 881
  • 9
  • 18