Selecting from multi-index (with duplicate values)

Question

Using data with a hierarchal index, is there a way to easily select a range of values? All the method's I've seen, including xs and .loc, seem to be limited to a single value, see Benefits of panda's multiindex?. Using this example data,

from pandas import *
from numpy import *
import itertools as it

M = 100 # Number of rows to generate

# Create some test data with multiindex
df = DataFrame(randn(M, 10))
df.index = [randint(4, size=M), randint(8, size=M)]
df.index.rename(['a', 'b'])

I'd like to be able to select everything where the first index is 1 or two and the second index is 3 or 4. The closest I've come is using .loc with a list of tuples

# Now extract a subset
part = df.loc[[(1, 3), (1,4), (2,3), (2,4)]]

But this gives some odd behavior,

# The old indices are still shown for some reason
print(part.index.levels)

# Good indexing
print("correct:\n", part.loc[(1, 1)])
# No keyerror, although the key wasn't included
print("wrong:\n", part.loc[[(0, 3)]])   
# Indexing of first index, and then a column, very odd
print("odd:\n", part.loc[(1, 9)])
# But there is an error accessing the original this way
print("Expected error:\n", df.loc[(1, 9)])

Output:

In [436]: [[0, 1, 2, 3], [0, 1, 2, 3, 4, 5, 6, 7]]
correct:
             0         1         2         3         4         5         6  \
1 3 -0.183667  0.578867 -0.944514  0.026295  0.778354  0.603845  0.636486   
  3 -0.337596  0.018084 -0.654721 -1.121475 -0.561706  0.695095 -0.512936   
  3 -0.670779 -0.425093  1.262278 -1.806815  0.855900 -0.230683 -0.225658   
  3 -0.274808 -0.529901  1.265333  0.559646 -1.418687  0.492577  0.141648   

            7         8         9  
1 3  1.109179 -1.569236 -0.617408  
  3 -0.659310  1.249105  0.032657  
  3  0.315601  1.100192 -0.389736  
  3 -0.267462 -0.025189  0.069047  
odd:
 3   -0.617408
3    0.032657
3   -0.389736
3    0.069047
4    0.217577
4   -0.232357
Name: 9, dtype: float64
wrong:
       0   1   2   3   4   5   6   7   8   9
0 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
(truncated)

So is there a better way than a list of tuples to access multiple parts of a hierarchal index? If not, is there a way to clean up the result after indexing with tuples so that sensible errors are given, rather than NaN?

score 0 · Answer 1 · answered Nov 16 '16 at 18:19

0

you can use pd.IndexSlice to have more human-readable slicings

In [52]: idx = pd.IndexSlice

In [53]: dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
Out[53]: 
lvl0           a    b
lvl1         foo  foo
A0 B0 C1 D0    8   10
         D1   12   14
      C3 D0   24   26
         D1   28   30
   B1 C1 D0   40   42
         D1   44   46
      C3 D0   56   58
...          ...  ...
A3 B0 C1 D1  204  206
      C3 D0  216  218
         D1  220  222
   B1 C1 D0  232  234
         D1  236  238
      C3 D0  248  250
         D1  252  254

[32 rows x 2 columns]

see here http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

answered Nov 16 '16 at 18:19

ℕʘʘḆḽḘ

18,566
34
128
235

How would this work with the example given in the question? – user2699 Nov 16 '16 at 18:22
It seems like `df.loc[IndexSlice[[0, 1], [3, 4]], :]` should work, but this gives an error `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'`. – user2699 Nov 16 '16 at 18:38

Selecting from multi-index (with duplicate values)

1 Answers1