0

Problem: Creation of Pearson correlation coeffizient dependant on values of third column.

To start with, I have a dataframe with 3 columns. A, B and C

Col. A and B contain float64 type whereas in C there are objects. I want to get the Pearson correlation coefficient for col A and B.

print(df['A'].corr(df['B'],method='pearson')) --> This works fine for the whole columns.

In the next step I struggle. Column C has got only 2 values. Let's call them c1 and c2. I now want to get the coefficients each for c1 and c2. I tried with

print(df['A']&df['C']=='c1').corr((df['B']&df['C']=='c1'),method='pearson')

and for c2 the same way. The documented error is: TypeError: unsupported operand type(s) for &: 'float' and 'str' How can I get both coefficients without splitting the dataframe?

Thanks in advance

NND
  • 23
  • 4
  • Why not create a new column with all of the coefficients, then just select the rows you need with `df['new'][df['C']=='c1']`? – Tim Roberts Jan 06 '22 at 00:04

1 Answers1

0

This should achieve what you're looking for:

print(df[df['C']=='c1']['A'].corr(df[df['C']=='c1']['B'],method='pearson'))

df[df['C']=='c1'] retrieves the subset of the dataframe where the value in column C is 'c1', and then you just call the column you want as usual.

jpk
  • 86
  • 4