1

My aim is to calculate the t-statistics and p-value for the following comparison:

  • Four lists of data (Core_R, Core_T, Periphery_R, Periphery_T)
  • The correlation between two lists is computed, respectivley: correlation Core_R and Core_T; correlation Periphery_R and Periphery_T
  • The result are two correlation values

The question I would like to answer is if if the correlation values (between Core and Periphery) significantly differ by showing the corresponding t-statistic and p-value for this comparison.

Here is my code:

import numpy as np
import scipy as sp
from scipy import stats

Core_R = [0.472202, 0.685151, 0.287613, 0.518002, 0.675128, 0.462418, 0.618170, 0.692822]
Core_T = [0.816606, 1.168650, 0.492040, 0.782458, 0.648625, 0.885237, 0.798031, 0.950363]
Periphery_R = [0.685151, 0.287613, 0.546364, 0.518002, 0.518002, 0.675128, 0.462418, 0.618170]
Periphery_T = [1.168650, 0.492040, 0.782458, 0.648625, 0.885237, 0.798031, 0.950363, 0.963140]

# Correlation and paired t-Tests
Pearson_core = sp.stats.pearsonr(Core_R, Core_T)
Pearson_periphery = sp.stats.pearsonr(Periphery_R, Periphery_T)
t_Test = sp.stats.ttest_rel([Core_R, Core_T], [Periphery_R, Periphery_T], axis=None, alternative="two-sided")

print(f"Pearson's r Core:      {Pearson_core}")
print(f"Pearson's r Periphery: {Pearson_periphery}")
print(f"t Test: {t_Test}")

This code works. It first computes the correlation between Core_R and Core_T, and then between Periphery_R and Periphery_T.

However, from my understanding, the line

t_Test = sp.stats.ttest_rel([Core_R, Core_T], [Periphery_R, Periphery_T], axis=None, alternative="two-sided")

does not compute the t-statistic and p-value between both correlations. Instead, the computation only compares the values that are provided by the four lists. The t-test here has consequently nothing to do with the previously obtained correlations.

Changing the code for the t Test to:

t_Test = sp.stats.ttest_rel(Pearson_core, Pearson_periphery, axis=None, alternative="two-sided")
print(t_Test)

will only take the two results from both pearson correlations into consideration, am I correct? Hence the resulting t-statistic and p-value does not represent a t Test that takes all values (from all four lists) into consideration.

My question is how I can solve this problem so that the t-Test compares the results from both correlations, but by including all original data points into the computation?

I am afraid that this line of code for the t-Test

t_Test = sp.stats.ttest_rel(Pearson_core, Pearson_periphery, axis=None, alternative="two-sided")
print(t_Test)

does not really compare the combination of Core_R with Core_T against Periphery_R with Periphery_T, but that it only computes the t-stats and p-value between two single values, namely the results of

Pearson_core = sp.stats.pearsonr(Core_R, Core_T)
Pearson_periphery = sp.stats.pearsonr(Periphery_R, Periphery_T)
Philipp
  • 335
  • 2
  • 4
  • 12
  • We can't run correlations between only two points. What are you trying to achieve/test? – johnjohn Jan 29 '22 at 12:05
  • I updated my question in order to better show and explain what I aim to achieve. – Philipp Jan 29 '22 at 12:43
  • 1
    Thanks for the update! If I understand correctly, you want to run a t-test on the two correlation values, yet again this is not possible (you only have 2 values, not a distribution). What you might do is compute correlation between all points in Core and all points in Periphery, and see if the correlation is significant. Is this what you want to do? – johnjohn Jan 29 '22 at 13:09
  • (i.e., correlation between all points of Core_R and Core_T taken together and all points of Periphery_R and Periphery_T taken together) – johnjohn Jan 29 '22 at 13:18
  • You are right, I would like to run a t-test on the two correlation values. As you point out, this is not straight forward possible. You mean that I should combine the datavalues of core and periphery, respectively. Then, I run a t-test between them. I did this before. The concern that then came up was that this t-test does not really compare the correlations, since correlations are not even computed in this example. Combining both lists of core and both lists of periphery to compute a t-Test results in my initial problem too, namely that the t-test would not compare correlations. Thanks so far – Philipp Jan 29 '22 at 17:29

1 Answers1

2

Correlation between two variables is given by comparing their variances (roughly). Hence it is impossible to calculate correleation between two variables who have just two points as the variance will be zero and that leads to a divizion by zero error in the correlation formula. You can test this by running this:

import scipy as sp
import pandas as pd
from scipy import stats

List_1 = [0.472202]
List_2 = [0.816606]

Data_core = {"List1": List_1, "List2": List_2}
DF = pd.DataFrame(data=Data_core)

Pearson = DF.corr()
print(Pearson)

It will be NaN.
Closest thing that you could be looking for is a distance metric

Mario
  • 561
  • 3
  • 18
  • 1
    Thank you. I edited my whole question to show the full problem that I am trying to solve. – Philipp Jan 29 '22 at 12:42
  • Most likely you are looking for an r to z fisher's transformation of the correlation coefficients to compare them. Still, I don't know your data or your reseach so I can't really be trusted on this one. All I can do is point you to a useful answer on crossvalidated (https://stats.stackexchange.com/questions/365154/testing-the-significance-between-two-correlations-in-python/427268). The library there may be of help – Mario Jan 29 '22 at 13:00