8

I have read doc of Advanced indexing with hierarchical index where using .loc for MultiIndex is explained. Also this thread: Using .loc with a MultiIndex in pandas?

Still I don't see how select rows where (first index == some value) or (second index == some value)

Example:

import pandas as pd

index = pd.MultiIndex.from_arrays([['a', 'a', 'a', 'b', 'b', 'b'],
                                  ['a', 'b', 'c', 'a', 'b', 'c']],
                                  names=['i0', 'i1'])
df = pd.DataFrame({'x': [1,2,3,4,5,6], 'y': [6,5,4,3,2,1]}, index=index)

Is this DataFrame:

       x  y
i0 i1      
a  a   1  6
   b   2  5
   c   3  4
b  a   4  3
   b   5  2
   c   6  1

How can I get rows where i0 == 'b' or i1 == 'b'?

       x  y
i0 i1      
a  b   2  5
b  a   4  3
   b   5  2
   c   6  1
Mr weasel
  • 113
  • 1
  • 8

4 Answers4

10

I think the easier answer is to use the DataFrame.query function which allows you to query the multi-index by name as follows:

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_arrays([list("aaabbb"),
                                  list("abcabc")],
                                  names=['i0', 'i1'])
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [6, 5, 4, 3, 2, 1]}, index=index)


df.query('i0 == "b" | i1 == "b"')

returns:

       x  y
i0 i1      
a  b   2  5
b  a   4  3
   b   5  2
   c   6  1
Mr weasel
  • 113
  • 1
  • 8
mmann1123
  • 5,031
  • 7
  • 41
  • 49
6

Use get_level_values()

>>> mask = (df.index.get_level_values(0)=='b') | (df.index.get_level_values(1)=='b') 
>>> df[mask]  # same as df.loc[mask]


        x   y
i0  i1      
a   b   2   5
b   a   4   3
    b   5   2
    c   6   1
rafaelc
  • 57,686
  • 15
  • 58
  • 82
5

This might be possible with some logical condition on the index columns i0 and i1 unsing .loc. However to me using .iloc seems easier:

You can get the iloc index via pd.MultiIndex.get_locs.

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_arrays([list("aaabbb"),
                                  list("abcabc")],
                                  names=['i0', 'i1'])
df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], 'y': [6, 5, 4, 3, 2, 1]}, index=index)

idx0 = index.get_locs(['b', slice(None)])  # i0 == 'b' => [3, 4, 5]
idx1 = index.get_locs([slice(None), 'b'])  # i1 == 'b' => [1, 4]
idx = np.union1d(idx0, idx1)

print(df.iloc[idx])

will yield

       x  y
i0 i1      
a  b   2  5
b  a   4  3
   b   5  2
   c   6  1

Note: slice(None) means the same as [:] in index-slicing.

ascripter
  • 5,665
  • 12
  • 45
  • 68
0

It is quite some time since this question was raised. After reading the answers available, however, I do see the benefit of adding my response which is going to answer original query exactly and how do it efficiently with minimum coding.

To select multiple indices as in your question, you can do :

df.loc[('b','b')]

Please note most critical point here is to use parenthesis () for indices. This will give an output :

x 5 y 2 Name: (b, b), dtype: int64

You can further add column name ('x' in my case) as if needed by doing as below:

df.loc[('b','b'),'x']

This will give output: 5

Entire process is in the attached image.

How to access multiple indices in a multiindex dataframe

  • Your answer is correct if the question was how to get (i0 == 'b' AND i1 == 'b'), but the question was about (i0 == 'b' OR i1 == 'b'). In your example the correct code should have returned the second line (i0=a and i1=b) and the last three lines (where i0 is b) – Galia Ladiray Weiss Mar 09 '23 at 07:47