Pandas xs where level in list of options

Question

If I have a pd.DataFrame that looks like:

new_df = []
for i in range(10):
    df_example = pd.DataFrame(np.random.normal(size=[10,1]))
    cols = [round(np.random.uniform(low=0,high=10)),round(np.random.uniform(low=0,high=10)),
        round(np.random.uniform(low=0,high=10)),round(np.random.uniform(low=0,high=10))]
    keys = ['A','B','C','D']
    new_ix = pd.MultiIndex.from_tuples([cols],names=keys)
    df_example.columns = new_ix
    new_df.append(df_example)
new_df = pd.concat(new_df,axis=1)

Which could yield something like:

Now, if I want where C=4 and A=1 I can do:

df.xs(axis=1,level=['A','C'],key=[1,4])

How do I express if I want:

C in [4,2] and A in [5,2]
C in [4,2] or A in [5,2]

score 0 · Answer 1 · answered Aug 14 '22 at 18:13

To the best of my knowledge, you can't use anything but tuples for key parameter in xs, so such queries are not possible.

The next best thing is to define helper functions for that purpose, such as the following:

def xs_or(df: pd.DataFrame, params: dict[str, list[int]]) -> pd.DataFrame:
    """Helper function.

    Args:
        df: input dataframe.
        params: columns/values to query.

    Returns:
        Filtered dataframe.

    """
    df = pd.concat(
        [
            df.xs(axis=1, level=[level], key=(key,))
            for level, keys in params.items()
            for key in keys
        ],
        axis=1,
    )
    for level in params.keys():
        try:
            df = df.droplevel([level], axis=1)
        except KeyError:
            pass
    return df


def xs_and(df: pd.DataFrame, params: dict[str, list[int]]) -> pd.DataFrame:
    """Helper function.

    Args:
        df: input dataframe.
        params: columns/values to query.

    Returns:
        Filtered dataframe.

    """
    for level, keys in params.items():
        df = xs_or(df, {level: keys})
    return df

And so, with the following dataframe named df:

A         4         7         3         1         7         9         4         0         3         9
B         6         7         4         6         7         5         8         0         8         0
C        2         10        5         2         9         9         4         3         4         5 
D        0         1         7         3         8         3         6         7         9         10
0 -0.199458  1.155345  1.298027  0.575606  0.785291 -1.126484  0.019082  1.765094  0.034631 -0.243635
1  1.173873  0.523277 -0.709546  1.378983  0.266661  1.626118  1.647584 -0.228162 -1.708271  0.111583
2  0.321156  0.049470 -0.611111 -1.238887  1.092369  0.019503 -0.473618  1.804474 -0.850320 -0.217921
3  0.339307 -0.758909  0.072159  1.636119 -0.541920 -0.160791 -1.131100  1.081766 -0.530082 -0.546489
4 -1.523110 -0.662232 -0.434115  1.698073  0.568690  0.836359 -0.833581  0.230585  0.166119  1.085600
5  0.020645 -1.379587 -0.608083 -1.455928  1.855402  1.714663 -0.739409  1.270043  1.650138 -0.718430
6  1.280583 -1.317288  0.899278 -0.032213 -0.347234  2.543415  0.272228 -0.664116 -1.404851 -0.517939
7 -1.201619  0.724669 -0.705984  0.533725  0.820124  0.651339  0.363214  0.727381 -0.282170  0.651201
8  1.829209  0.049628  0.655277 -0.237327 -0.007662  1.849530  0.095479  0.295623 -0.856162 -0.350407
9 -0.690613  1.419008 -0.791556  0.180751 -0.648182  0.240589 -0.247574 -1.947492 -1.010009  1.549234

You can filter like this:

# C in [10, 2] or A in [1, 0]
print(xs_or(df, {"C": [10, 2], "A": [1, 0]}))
# Output
B         7         6                   2         3
D         1         0         3         3         7
0  1.155345 -0.199458  0.575606  0.575606  1.765094
1  0.523277  1.173873  1.378983  1.378983 -0.228162
2  0.049470  0.321156 -1.238887 -1.238887  1.804474
3 -0.758909  0.339307  1.636119  1.636119  1.081766
4 -0.662232 -1.523110  1.698073  1.698073  0.230585
5 -1.379587  0.020645 -1.455928 -1.455928  1.270043
6 -1.317288  1.280583 -0.032213 -0.032213 -0.664116
7  0.724669 -1.201619  0.533725  0.533725  0.727381
8  0.049628  1.829209 -0.237327 -0.237327  0.295623
9  1.419008 -0.690613  0.180751  0.180751 -1.947492

# C in [10, 2] and A in [1, 7]
print(xs_and(df, {"C": [10, 2], "A": [1, 7]}))
# Output
B         6         7
D         3         1
0  0.575606  1.155345
1  1.378983  0.523277
2 -1.238887  0.049470
3  1.636119 -0.758909
4  1.698073 -0.662232
5 -1.455928 -1.379587
6 -0.032213 -1.317288
7  0.533725  0.724669
8 -0.237327  0.049628
9  0.180751  1.419008

https://stackoverflow.com/questions/51032985/pandas-conditional-select-using-loc-with-multiindex This answer seemed promising but haven’t tested. — jason m, Aug 14 '22 at 19:04

Pandas xs where level in list of options

1 Answers1