This is just to share a very basic concept for beginners of multiindex dataframes.
The "empty" items are part of the multiindex view and only appear when you output a df, it is helping you to understand the hierarchy. If you output the isolated Multiindex class, no item will be empty. Thus, the index items are never really empty, and the "empty" fields only appear for df ouputs:
- If the "A_idx" index is assigned to more than one "B_idx" index value, the "A_idx" index is not repeated, because it is the parent.
- If "A_idx" index points to more than one value row while "B_idx" index is repeating, B_idx is still repeated, because it is the child.
If you take the df.head(10)
and find out that the "empty" index item is in line 1, you can also check this quickly in your df using df.iloc[1].reset_index()
. You will see that the index item is not empty.
In the following, "first" and "second" are index names seemingly with equal rights to be either parent as they are on the same output line, but in reality the hierarchy goes from left to right.
first second
bar one 0.289163 -0.464633 -0.060487
two 0.224442 0.177609 2.156436
baz one -0.262329 -0.248384 0.925580
foo one 0.051350 0.452014 0.206809
two 2.757255 -0.739196 0.183735
three -0.064909 -0.963130 1.364771
qux one -1.330857 1.881588 -0.262170
Thanks for the example go to Access last elements of inner multiindex level in pandas dataframe.
This actually means:
first second
bar one 0.289163 -0.464633 -0.060487
bar two 0.224442 0.177609 2.156436
baz one -0.262329 -0.248384 0.925580
foo one 0.051350 0.452014 0.206809
foo two 2.757255 -0.739196 0.183735
foo three -0.064909 -0.963130 1.364771
qux one -1.330857 1.881588 -0.262170
####
Example of how to create a hierarchy accordingly.
The order of the column list that is passed to the set_index() creates the hierarchy in the same order.
You can check this out in a small example I borrowed from pandas multiindex reindex by rows, with df2 covering a switch of the two indices. Only df shows the secret "empty items", see df vs. df2 output:
df = pd.DataFrame({'month': [1, 4, 7, 10],
'year': [2012, 2012, 2013, 2013],
'sale': [55, 40, 84, 31]})
df2 = df.copy()
df=df.set_index(['year','month'])
df2=df2.set_index(['month','year'])
df:
sale
year month
2012 1 55
4 40
2013 7 84
10 31
df2:
month year sale
0 1 2012 55
1 4 2012 40
2 7 2013 84
3 10 2013 31
df.index
Output:
MultiIndex([(2012, 1),
(2012, 4),
(2013, 7),
(2013, 10)],
names=['year', 'month'])
Or:
df2.index
Output:
MultiIndex([( 1, 2012),
( 4, 2012),
( 7, 2013),
(10, 2013)],
names=['month', 'year'])
Have a look at the levels in the df:
df.index.levels[0]
Int64Index([2012, 2013], dtype='int64', name='year')
df.index.levels[1]
Int64Index([1, 4, 7, 10], dtype='int64', name='month')
df2.index.levels[0]
Int64Index([1, 4, 7, 10], dtype='int64', name='month')
df2.index.levels[1]
Int64Index([2012, 2013], dtype='int64', name='year')
If you want to check or clarify the different levels of the hierarchy in the output view, choose one row and reset the index:
df.iloc[1].reset_index()
Output:
index 2012
4
0 sale 40
Or:
df2.iloc[1].reset_index()
Output:
index 4
2012
0 sale 40