I am getting inconsistent indexing behaviour depending on whether one of the indices in my MultiIndex dataframe is an integer or a string. Here is an example:
MultiIndex with one string index:
a = [['a','a','a','b','b','b','c','c'],[2,1,1,1,2,2,2,1]]
b = [['a','b','c','a','b','c','a','b'],[2,2,2,2,4,4,4,4]]
index=pd.MultiIndex.from_tuples(list(zip(*b)),names=['num1','num2'])
df1=pd.DataFrame({'letters': a[0],'numbers': a[1]},index=index)
df1.sort_index(inplace=True) # avoid lex sort warnings
df1
letters numbers
num1 num2
a 2 a 2
2 b 1
4 c 2
b 2 a 1
4 b 2
4 c 1
c 2 a 1
4 b 2
df1.loc['a',2]['letters'][0]
'a'
df1.loc['a',2]['letters'][1]
'b'
MultiIndex with all integer indices:
a = [['a','a','a','b','b','b','c','c'],[2,1,1,1,2,2,2,1]]
b = [[1,2,3,1,2,3,1,2],[2,2,2,2,4,4,4,4]]
index=pd.MultiIndex.from_tuples(list(zip(*b)),names=['num1','num2'])
df1=pd.DataFrame({'letters': a[0],'numbers': a[1]},index=index)
df1.sort_index(inplace=True) # avoid lex sort warnings
df1
letters numbers
num1 num2
1 2 a 2
2 b 1
4 c 2
2 2 a 1
4 b 2
4 c 1
3 2 a 1
4 b 2
df1.loc[1,2]['letters'][0]
'a'
df1.loc[1,2]['letters'][1]
num2
2 a
2 b
Name: letters, dtype: object
The behaviour in the first case is what I expect. Could someone explain why in the second case indexing with 1
returns a series instead of the string 'b'
?