2

I have a basic dataframe, structured like this:

             Col1
Ind1 Ind2
0    key1    12
     key2    35
1    key3    56
     key4    24
     key5    65

...and another one like this:

    ColA
0   key1
1   else
2   else
3   key3

What I need is the mean value of df1, grouped based on whether Ind2 is in df2 or not. This is what I tried without success; the message sais "Lengths must match to compare" -- but of course, they don't.

df1 = pd.DataFrame({'ind1': [0, 0, 1, 1, 1], 'ind2': ['key1', 'key2', 'key3', 'key4', 'key5'], 'col1': [12, 35, 56, 24, 65]}, )
df1.set_index(['ind1', 'ind2'], inplace=True)
df2 = pd.DataFrame({'ColA': ['key1', 'else', 'else', 'key3']})

print (df1.groupby(df1.index.levels[1] in df2.get_values()).mean())

Thanks in advance for any hint!

1 Answers1

1

You actually want to check whether an element of df1.index.levels[1] is in df2.ColA (since you need a value for each row). The syntax you wrote won't get you that. Instead, you should try

df1.groupby(df1.index.levels[1].isin(df2.ColA)).mean()

Note the isin function that returns True/False for every element, and the fact that I refer directly to df2.ColA, since it is the column that contains the values (reffering to df2 instead would search for the values in the column names of df2).

tmrlvi
  • 2,235
  • 17
  • 35