Group rows in Pandas DataFrame based on complex condition

Question

I have a basic dataframe, structured like this:

             Col1
Ind1 Ind2
0    key1    12
     key2    35
1    key3    56
     key4    24
     key5    65

...and another one like this:

    ColA
0   key1
1   else
2   else
3   key3

What I need is the mean value of df1, grouped based on whether Ind2 is in df2 or not. This is what I tried without success; the message sais "Lengths must match to compare" -- but of course, they don't.

df1 = pd.DataFrame({'ind1': [0, 0, 1, 1, 1], 'ind2': ['key1', 'key2', 'key3', 'key4', 'key5'], 'col1': [12, 35, 56, 24, 65]}, )
df1.set_index(['ind1', 'ind2'], inplace=True)
df2 = pd.DataFrame({'ColA': ['key1', 'else', 'else', 'key3']})

print (df1.groupby(df1.index.levels[1] in df2.get_values()).mean())

Thanks in advance for any hint!

tmrlvi · Answer 1 · 2017-04-14T15:18:03.430

You actually want to check whether an element of df1.index.levels[1] is in df2.ColA (since you need a value for each row). The syntax you wrote won't get you that. Instead, you should try

df1.groupby(df1.index.levels[1].isin(df2.ColA)).mean()

Note the isin function that returns True/False for every element, and the fact that I refer directly to df2.ColA, since it is the column that contains the values (reffering to df2 instead would search for the values in the column names of df2).

Group rows in Pandas DataFrame based on complex condition

1 Answers1