2

I want to select multiple columns from a MultiIndex with multiple independent indexers. For example

df = pd.DataFrame(
    np.zeros((2,4)),
    columns=pd.MultiIndex.from_product([('a','b'),(1,2)])
)

from this DataFrame

    a       b   
    1   2   1   2
0   0   0   0   0
1   0   0   0   0

I want to select all columns under 'a' plus ('b', 1), like df[[('a', 1), ('a', 2), ('b', 1)]], but I don't want to have to explicitly specify all levels of the columns below 'a'.

What does not work:

  • df[['a', ('b', 1)]] : KeyError: "[('b', 1)] not in index"
  • df.loc[:, ['a', ('b', 1)]] : KeyError: "[('b', 1)] not in index"
  • df[[('a', slice(None)), ('b', 1)]] : TypeError: unhashable type: 'slice'
  • df.loc[:, [pd.IndexSlice['a', :], ('b', 1)]] : TypeError: unhashable type: 'slice'

Another similar thing I would like to be able to do is: ('a', 1) plus pd.IndexSlice[:, 2]

goweon
  • 1,111
  • 10
  • 19

5 Answers5

1

You can select the columns separately and concat them.

out = pd.concat([df.xs('a', axis=1, drop_level=False),
                 df.xs(('b', 1), axis=1, drop_level=False)],
                axis=1)
print(out)

     a         b
     1    2    1
0  0.0  0.0  0.0
1  0.0  0.0  0.0
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
1

Another possible solution:

df.loc[:,('a', slice(None))].join(df.loc[:,('b', 1)])

Alternatively:

df[[(x,y) for x, y in df.columns if (x == 'a') | ((x == 'b') & (y == 1))]]

Output:

     a         b
     1    2    1
0  0.0  0.0  0.0
1  0.0  0.0  0.0
PaulS
  • 21,159
  • 2
  • 9
  • 26
0

Here is an option using two boolean masks

m1 = df.columns.get_level_values(0).isin(['a'])
m2 = df.columns.get_level_values(1).isin([1])

df.loc[:,m1|m2]

Output:

     a         b
     1    2    1
0  0.0  0.0  0.0
1  0.0  0.0  0.0
rhug123
  • 7,893
  • 1
  • 9
  • 24
  • this is probably fastest, but not as flexible as it doesn't use the standard slicing syntax – goweon May 23 '23 at 19:36
0

A slice works fine for the example shared:

df.loc(axis=1)[('a',1):('b',1)]
     a         b
     1    2    1
0  0.0  0.0  0.0
1  0.0  0.0  0.0

Based on your feedback, you want something more flexible - the standard slicing syntax does not fit in here. One option is with select_columns :

# pip install pyjanitor
import janitor
df.select_columns({0:'a'}, ('b',2))
     a         b
     1    2    2
0  0.0  0.0  0.0
1  0.0  0.0  0.0

The syntax allows selection on a MultiIndex on a level with a dictionary (key is the level, value is the label), or via a tuple.

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • This is not general enough to be useful, as slicing only works for the example because the columns just happen to be right next to each other. i.e. you can select `'a'` plus `('b', 1)` with this method, but not `'a'` plus `('b', 2)` – goweon May 30 '23 at 14:55
  • Answer is specific to what you asked. What you are asking for is no more slicing, but some generic/flexible option. – sammywemmy May 30 '23 at 20:33
0

I ended up writing a helper function to do this. It takes an array of slicers, and uses each one independently, then gathers all the selected columns.

def mimsc(col_specs):
    # usage: df.loc[msms(['A', ('B', 'X')])]
    def slicer(df):
        cols = []
        dfc = df.columns.to_frame()
        for cs in col_specs:
            cols.append(dfc.loc[cs])
        all_cols = pd.concat(cols, ignore_index=True)
        return pd.MultiIndex.from_frame(all_cols)
    return slicer

usage

df.loc[:, mimsc(['a', ('b', 1)])]
df.loc[:, mimsc([('b', 1), pd.IndexSlice[:, 2]])]

Here's a more generic version that also works with indexes

def mims(col_specs, axis=1):
    def slicer(df):
        cols = []
        if axis==1:
            dfc = df.columns.to_frame()
        elif axis==0:
            dfc = df.index.to_frame()
        for cs in col_specs:
            col = dfc.loc[cs, :]
            if isinstance(col, pd.Series):
                col = dfc.loc[[cs], :]
            cols.append(col)
        all_cols = pd.concat(cols, ignore_index=True)
        return pd.MultiIndex.from_frame(all_cols)
    return slicer

example

df.T.loc[mims(['a', ('b', 1)], axis=0), :]
df.T.loc[mims([('b', 1), pd.IndexSlice[:, 2]], axis=0), :]
goweon
  • 1,111
  • 10
  • 19