0

If I have pandas dataframe includes 3 columns Col1 & Col2& Col3 and I need to get max Pearson's correlation coefficient between Col2 and Col3 By considering the values in Col1 where the modified values For Col2 obtained by the next formula:

df['Col1']=np.power((df['Col1']),B)
df['Col2']=df['Col2']*df['Col1']

where B is the changing variable to get max Pearson's correlation coefficient between Col3 and the new values of Col2

So is there a Python method that can do that and return B.Is there a way to do this operation using Python and return B value, where I want to repeat this process to other columns.

Sidhom
  • 935
  • 1
  • 8
  • 15

1 Answers1

2

This should work

import pandas as pd
import numpy as np
from scipy.optimize import minimize

# dataframe with 20 rows
df = pd.DataFrame(data=np.random.randn(20,3), 
                  columns=['Col1', 'Col2', 'Col3'])

# cost function
def cost_fun(B_array, df):
    B = B_array[0]
    new_col1 = np.power((df['Col1']), B)
    new_col2 = np.array(df['Col2']) * new_col1
    col3 = np.array(df['Col3'])
    pearson = np.corrcoef(new_col2, col3)[1,0]
    return -1*pearson # multiply by -1 to get max

# initial value
B_0 = 1.1

# run minimizer
res = minimize(cost_fun, [B_0], args=(df), 
               options={"maxiter": 100,
                        "disp": True})
# results
print(res)
Adarsh Chavakula
  • 1,509
  • 19
  • 28
  • why you use this line return `-1*pearson # multiply by -1 to get max`, is the best to use abs (np.corrcoef(new_col2, col3)[1,0])? – Sidhom Apr 26 '19 at 14:15
  • 1
    `minimize` tries to get the lowest possible objective function value. Since the goal is to maximize the pearson, we try to minimize the negative of it. We are not trying to get the absolute value – Adarsh Chavakula Apr 26 '19 at 14:18