74

Does anyone know if it is possible to use the DataFrame.loc method to select from a MultiIndex? I have the following DataFrame and would like to be able to access the values located in the Dwell columns, at the indices of ('at', 1), ('at', 3), ('at', 5), and so on (non-sequential).

I'd love to be able to do something like data.loc[['at',[1,3,5]], 'Dwell'], similar to the data.loc[[1,3,5], 'Dwell'] syntax for a regular index (which returns a 3-member series of Dwell values).

My purpose is to select an arbitrary subset of the data, perform some analysis only on that subset, and then update the new values with the results of the analysis. I plan on using the same syntax to set new values for these data, so chaining selectors wouldn't really work in this case.

Here is a slice of the DataFrame I'm working with:

         Char    Dwell  Flight  ND_Offset  Offset
QGram                                                           
at    0     a      100     120   0.000000       0  
      1     t      180       0   0.108363       5  
      2     a      100     120   0.000000       0 
      3     t      180       0   0.108363       5 
      4     a       20     180   0.000000       0  
      5     t       80     120   0.108363       5
      6     a       20     180   0.000000       0   
      7     t       80     120   0.108363       5  
      8     a       20     180   0.000000       0  
      9     t       80     120   0.108363       5   
      10    a      120     180   0.000000       0  
Neuron
  • 5,141
  • 5
  • 38
  • 59
kronosapiens
  • 1,333
  • 1
  • 10
  • 19

4 Answers4

71

If you are on version 0.14, you can simply pass a tuple to .loc as below:

df.loc[('at', [1,3,4]), 'Dwell']
chrisb
  • 49,833
  • 8
  • 70
  • 70
  • 7
    Funny because if instead of a tuple, you pass a list, it does not work properly – leoschet Jun 16 '19 at 07:50
  • 6
    @leoschet Pandas interprets tuple entries as levels and list entries as items in a level. https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-indexing-with-hierarchical-index FYI – D.J.Duff Feb 26 '20 at 11:03
17

Try the cross-section indexing:

In [68]: df.xs('at', level='QGram', drop_level=False).loc[[1,4]]
Out[68]: 
        Char  Dwell  Flight  ND_Offset  Offset
QGram                                         
at    1    t    180       0   0.108363       5
      4    a     20     180   0.000000       0
R. Max
  • 6,624
  • 1
  • 27
  • 34
  • This would be the way that the pandas docs recommend, as slicing with deep indexes can be done with xs: http://pandas-docs.github.io/pandas-docs-travis/user_guide/advanced.html#advanced-xs – physincubus Jun 03 '19 at 11:15
  • 6
    Is `xs` still recommended? – baxx Sep 16 '20 at 23:03
  • 1
    @baxx. Yes, `xs` is still recommended. See https://pandas.pydata.org/pandas-docs/dev/user_guide/advanced.html#cross-section – amball Nov 17 '21 at 19:00
4

.loc is your best friend with multi-index. However, you must understand how loc works on multi indexes. When using loc on multi indexes you must specify every other index value in the loc such as:

     df.loc['indexValue1','indexValue2','indexValue3']

However, as you may imagine this may be a pain in cases you don't know what all the other values are so we can of course use ':'

      df.loc[:,'value1','value2',:]

Hope this helps!

syntactic
  • 109
  • 6
2

In general, MultiIndex keys take the form of tuples. For example:

In [6]: df.loc[('at', 1),'Dwell']
Out[6]: 180

In your case, you would have to pass a list of tuples. For example, the following works as you would expect:

In [7]: df.loc[ [('at', 1),('at', 3),('at', 5)], 'Dwell']
Out[7]:
          Dwell
QGram                                                           
at    1    180
at    3    180 
at    5     80  
Marioanzas
  • 1,663
  • 2
  • 10
  • 33