0

Why does this:

df_data = pd.DataFrame([[1, 2, 3], [4,5,6]], index=pd.MultiIndex.from_tuples([(1,1), (1,2)]))
print(df_data.loc[[(13,37)]])

Return a non-existing row filled with NaN

        0   1   2
13 37 NaN NaN NaN

instead of throwing a KeyError exception as it would if I tried to access it with df_data.loc[(13,37)]?

Stefan Falk
  • 23,898
  • 50
  • 191
  • 378

1 Answers1

1

This is due to feature setting with enlargement as explained in the documentation here. Citing from the doc:

The .loc/[] operations can perform enlargement when setting a non-existant key for that axis.

so, if you want to get a key error, you need to use df_data.loc[(13,37)] instead of df_data.loc[[(13,37)]]


Example:

In [24]: df_data
Out[24]: 
     0  1  2
1 1  1  2  3
  2  4  5  6

In [25]: df_data.loc[[(13,37)]]
Out[25]: 
        0   1   2
13 37 NaN NaN NaN

In [26]: df_data.loc[(13,37)]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
KeyError: 'the label [13] is not in the [index]'

Here's a similar discussion: python slicing does not give key error even when the column is missing

Mohamed Ali JAMAOUI
  • 14,275
  • 14
  • 73
  • 117
  • Okay, I see. The question came up as I tried select a list of rows based on a list of indices `loc[ [(1,1), (1,2), ..] ]` - that's how I notice this behavior. – Stefan Falk Oct 03 '17 at 09:10