Creating an empty MultiIndex

Question

I would like to create an empty DataFrame with a MultiIndex before assigning rows to it. I already found that empty DataFrames don't like to be assigned MultiIndexes on the fly, so I'm setting the MultiIndex names during creation. However, I don't want to assign levels, as this will be done later. This is the best code I got to so far:

def empty_multiindex(names):
    """
    Creates empty MultiIndex from a list of level names.
    """
    return MultiIndex.from_tuples(tuples=[(None,) * len(names)], names=names)

Which gives me

In [2]:

empty_multiindex(['one','two', 'three'])

Out[2]:

MultiIndex(levels=[[], [], []],
           labels=[[-1, -1, -1], [-1, -1, -1], [-1, -1, -1]],
           names=[u'one', u'two', u'three'])

and

In [3]:
DataFrame(index=empty_multiindex(['one','two', 'three']))

Out[3]:
one two three
NaN NaN NaN

Well, I have no use for these NaNs. I can easily drop them later, but this is obviously a hackish solution. Anyone has a better one?

@AndyHayden I'm trying to write a general enough function to handle arbitrary numbers of names. My assignment is to create frequency tables with very arbitrary and whimsical totals and subtotals and subsubtotals that can be folded and unfolded in a dashboard. Creating dataframes before passing them to Django makes my life easier. — dmvianna, Feb 03 '15 at 06:29
Why do this as a MI rather than a columns? Generally pandas is pretty bad at updating on a row by row basis (as it has to copy the entirety of the data each time). Could you make it a MI later (after construction)? — Andy Hayden, Feb 03 '15 at 06:35
@AndyHayden it is more convenient and readable to create labels by assignment (`df2.loc[(name, key2, True), :] = df1.loc[(key1, key2), :].sum()`) than to torture a `Series` before assignment by appending to it. And maintaining parallel DataFrames for Indexes and data would be even worse. — dmvianna, Feb 03 '15 at 23:02
I think I would argue that a DataFrame may not be the right data structure to use in this case. — Andy Hayden, Feb 03 '15 at 23:13
Well, without knowing the precise specs it's hard to give the best solution, have you tried just using a dictionary? — Andy Hayden, Feb 03 '15 at 23:39
@AndyHayden A dict won't give me pandas DataFrame indexing and methods such as sum() that I can combine with indexing. I agree that there could be a better solution (such as creating an object from scratch that does what I want). But at this point I'm optimising for developer time rather than processing time. — dmvianna, Feb 05 '15 at 02:13

RoG · Accepted Answer · 2023-05-02T12:00:28.813

56

The solution is to leave out the labels. This works fine for me:

>>> import pandas as pd
>>> my_index = pd.MultiIndex(levels=[[],[],[]],
...                          codes=[[],[],[]],
...                          names=[u'one', u'two', u'three'])
>>> my_index
MultiIndex([], names=['one', 'two', 'three'])
>>> my_columns = [u'alpha', u'beta']
>>> df = pd.DataFrame(index=my_index, columns=my_columns)
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
                    alpha beta
one   two    three
apple banana cherry   0.1  0.2

For Pandas Version < 0.25.1: The keyword labels can be used in place of codes

edited May 02 '23 at 12:00

answered Jul 09 '15 at 07:22

RoG

828
9
14

4

`[[],[],[]]` can be replaced with `[[]]*3` if desired. – JoseOrtiz3 Apr 09 '17 at 03:52
1

This throws a deprecation warning on Pandas '0.25.1'. – buechel Sep 18 '19 at 10:04
10

@buechel the keyword `labels` has been replaced with `codes` in 0.25.1 – xuva Nov 07 '19 at 20:53

score 38 · Answer 2 · answered Aug 21 '17 at 12:54

Another solution which is maybe a little simpler is to use the function set_index:

>>> import pandas as pd
>>> df = pd.DataFrame(columns=['one', 'two', 'three', 'alpha', 'beta'])
>>> df = df.set_index(['one', 'two', 'three'])
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
                    alpha beta
one   two    three            
apple banana cherry   0.1  0.2

score 11 · Answer 3 · answered Aug 27 '20 at 09:42

11

Using pd.MultiIndex.from_tuples may be more straightforward.

import pandas as pd
ind = pd.MultiIndex.from_tuples([], names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]
df

                      alpha beta
one     two     three       
apple   banana  cherry    4    3

answered Aug 27 '20 at 09:42

ronkov

1,263
9
14

1

Way easier because you don't need to pass $n$ empty lists... – Ivan Gonzalez Feb 25 '22 at 04:09

score 4 · Answer 4 · answered Nov 17 '19 at 07:39

Using pd.MultiIndex.from_arrays allows for a slightly more concise solution when defining the index explicitly:

import pandas as pd
ind = pd.MultiIndex.from_arrays([[]] * 3, names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]

                     alpha  beta
one   two    three              
apple banana cherry      4     3

Creating an empty MultiIndex

4 Answers4

Linked