1

I am getting inconsistent indexing behaviour depending on whether one of the indices in my MultiIndex dataframe is an integer or a string. Here is an example:

MultiIndex with one string index:

a = [['a','a','a','b','b','b','c','c'],[2,1,1,1,2,2,2,1]]
b = [['a','b','c','a','b','c','a','b'],[2,2,2,2,4,4,4,4]]
index=pd.MultiIndex.from_tuples(list(zip(*b)),names=['num1','num2'])
df1=pd.DataFrame({'letters': a[0],'numbers': a[1]},index=index)
df1.sort_index(inplace=True) # avoid lex sort warnings

df1
          letters  numbers
num1 num2                 
a    2          a        2
     2          b        1
     4          c        2
b    2          a        1
     4          b        2
     4          c        1
c    2          a        1
     4          b        2

df1.loc['a',2]['letters'][0]
'a'

df1.loc['a',2]['letters'][1]
'b'

MultiIndex with all integer indices:

a = [['a','a','a','b','b','b','c','c'],[2,1,1,1,2,2,2,1]]
b = [[1,2,3,1,2,3,1,2],[2,2,2,2,4,4,4,4]]
index=pd.MultiIndex.from_tuples(list(zip(*b)),names=['num1','num2'])
df1=pd.DataFrame({'letters': a[0],'numbers': a[1]},index=index)
df1.sort_index(inplace=True) # avoid lex sort warnings

df1
          letters  numbers
num1 num2                 
1    2          a        2
     2          b        1
     4          c        2
2    2          a        1
     4          b        2
     4          c        1
3    2          a        1
     4          b        2

df1.loc[1,2]['letters'][0]
'a'

df1.loc[1,2]['letters'][1]
num2
2    a
2    b
Name: letters, dtype: object

The behaviour in the first case is what I expect. Could someone explain why in the second case indexing with 1 returns a series instead of the string 'b'?

Sansport
  • 27
  • 1
  • 5
  • For me last `df1.loc['a',2]['letters'][1]` return `KeyError: 'the label [a] is not in the [index]'` – jezrael Aug 09 '18 at 13:10
  • Please could you clarify what you are trying to achieve in the second case: your MultiIndex is made up of integers at both levels, but you are indexing with a string 'a' which doesn't exist? What happens when you try: `df1.loc[1, 2]['letters'][1]' – Seth Nabarro Aug 09 '18 at 13:10
  • Sorry, that was a typo (I was copy/pasting). Have corrected it now. The code should now work as shown. – Sansport Aug 09 '18 at 13:14

1 Answers1

1

If use iat/ iloc for select by position all working nice.

Also for select values of MultiIndexed DataFrame is possible use tuple.

a = df1.loc[('a',2), 'letters'].iat[0]
print (a)
a

b = df1.loc[('a',2), 'letters'].iat[1]
print (b)
b

a = df1.loc[(1,2), 'letters'].iat[0]
print (a)
a

b = df1.loc[(1,2), 'letters'].iat[1]
print (b)
b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Yes, I realise that it works with `iloc` but I want to understand why does it not work without it. I should be able to select from a series without `iloc`, correct? – Sansport Aug 09 '18 at 13:16
  • 1
    @Sansport - Yes it is possible, but it is buggy. So rather use `iat` or `iloc` – jezrael Aug 09 '18 at 13:17