Safe label-based selection in DataFrame

Question

How can I safely select rows in pandas by a list of labels?
I want to get and Error when list contains any non-existing label.

Method loc doesn't raise a KeyError if at least 1 of the labels for which you ask is in the index. But this is not sufficient.

For example:

df = pd.DataFrame(index=list('abcde'), data={'A': np.arange(5) + 10})

df
    A
a  10
b  11
c  12
d  13
e  14

# here I would like to get an Error as 'xx' and 'yy' are not in the index
df.loc[['b', 'xx', 'yy']] 

       A
b   11.0
xx   NaN
yy   NaN

Do pandas provide such a method that would raise a KeyError instead of returning me a bunch of NaNs for non-existing labels?

@maxU this is the selction of columns that you suggest. `df[['b']]` will also raise a `KeyError` — Temak, Oct 23 '16 at 16:02
I don't know whether it's sufficient for you, but you can check it beforehand: `len(pd.Index(['b', 'xx', 'yy']).difference(df.index))` should be `0` — MaxU - stand with Ukraine, Oct 23 '16 at 16:08
Yes, but It's a pity if we have to do checking ourselves. It's a bit cumbersome. Maybe somebody offers something else. — Temak, Oct 23 '16 at 16:11
"Passing list-likes to .loc or [] with any missing labels is no longer supported". See https://stackoverflow.com/questions/61291741/passing-list-likes-to-loc-or-with-any-missing-labels-is-no-longer-supported — Edward, Jan 12 '21 at 14:14

score 2 · Accepted Answer · answered Oct 23 '16 at 16:41

It's bit a hack, but one can do this like this:

def my_loc(df, idx):
    assert len(df.index[df.index.isin(idx)]) == len(idx), 'KeyError:the labels [{}] are not in the [index]'.format(idx)
    return df.loc[idx]

In [243]: my_loc(df, idx)
...
skipped
...
AssertionError: KeyError:the labels [['b', 'xx', 'yy']] are not in the [index]

In [245]: my_loc(df, ['a','c','e'])
Out[245]:
    A
a  10
c  12
e  14

jezrael · Answer 2 · 2016-10-23T16:46:31.970

I think you get NaN because loc or ix return same output as reindex. See reindex versus ix.

Solution for 'safe' selecting from open issue 10695:

list_of_values = ['b', 'xx', 'yy']
print (df.ix[df.reset_index()['index'].isin(list_of_values).values])
    A
b  11

One solution for return error if values are not in index is use drop:

df.drop(list_of_values)
print (df.loc[list_of_values])

ValueError: labels ['xx' 'yy'] not contained in axis

score 1 · Answer 3 · answered Oct 23 '16 at 18:05

.loc[] won't let you set values if the index in the list is not part of the initial df

so you could do something like:

df = pd.DataFrame(index=['a','b','c','d', 'e'], data={'A':range(5)})
index_list = ['a', 'b']
df.loc[index_list] = df.loc[index_list]
df.loc[index_list]
Out[288]: 
   A
a  0
b  1

now let's test for a dummy index:

index_list = ['aa', 'b']
df.loc[index_list] = df.loc[index_list]

you will get this error

   Traceback (most recent call last):
      File "C:\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-289-d6aac5dac03d>", line 2, in <module>
        df.loc[index_list] = df.loc[index_list]
      File "C:\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 139, in __setitem__
        indexer = self._get_setitem_indexer(key)
      File "C:\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 126, in _get_setitem_indexer
        return self._convert_to_indexer(key, is_setter=True)
      File "C:\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1229, in _convert_to_indexer
        raise KeyError('%s not in index' % objarr[mask])
    KeyError: "['aa'] not in index"

I tested with `timeit` this approach. And it is a bit slower than checking that `len(df.index[df.index.isin(idx)]) == len(idx)`. *~1 sec vs ~0.5 sec* for 100 repeats with `df` with 50000 rows and for `idx` with 50000 elements — Temak, Oct 23 '16 at 21:03
yes this hack is not performance oriented but was more an answer on how to return `ValueError` with `loc[]` — Steven G, Oct 23 '16 at 21:06

Safe label-based selection in DataFrame

3 Answers3

Linked