0

I am trying to develop a function which allows me to generate a Pearson Correlation Coefficient for every pair of columns in csv data set. The function needs to return: a list of tuples, each tuple containing two column names and then the Pearson correlation coefficient value. However, i cannot use any external libraries for this. So i cannot import things like csv reader or NumPy.

This is what I have attempted so far but I am struggling to understand how to continue this or approach it differently.

def gen_pearson(file_name):
    col_names = data [0]
#create list of tuples of pairs of columns
    col_pairs = [(col_names[i], col_names[j]) for i in range(len(col_names)) for j in range (i+1, len(col_names))]

#calculate pearson correlation coefficient for each pair columns   
coefficients = []
for pair in col_pairs:
    col_1 = [row[pair[0]] for row in data[1:]]
    col_2 = [row[pair[1]] for row in data[1:]]

coefficient = sum((a - mean_col_1) * (b - mean_col_2) for (a,b) in zip (col_1, col_2)) / len (col_1)
stdev_x = (sum((a - mean_col_1) **2 for a in col_1)/len(col_1)) **0.5
stdev_y = (sum((b - mean_col_2) **2 for b in col_2)/len(col_2)) **0.5
pearson_result = cov/(stdev_x * stdev_y)
d_allen
  • 13
  • 1

0 Answers0