-2

I am trying to use Go for simple statistics.

I am using this package to get correlation coefficient.

It works well but it does not give P value of the correlation. Other functions in this package are given above on the same page: https://godoc.org/gonum.org/v1/gonum/stat

Similarly, this package also has correlation function which returns coefficient but not P value.

How can I find P value of correlation coefficient with any of these packages?

Edit: I had posted this question at crossvalidated (stats.stackexchange.com) where it was suggested that it is a programming question.

rnso
  • 23,686
  • 25
  • 112
  • 234

1 Answers1

1

It looks like you'll need to calculate it manually, and there are multiple ways to do this, depending on assumptions you can make about your data. If you really go this route, I'd strongly encourage you to also test against existing implementations - for example R's cor.test - to ensure that you're not doing something wrong.

Normality Assumption

If the observed values are each approximately normal, then the value

enter image description here

where r is the calculated correlation coefficient and n is the number of observations, will follow Student's t distribution with n-2 degrees of freedom. Hence, you can use Student's t distribution as implemented in GoNum to compute the p-value. This is what cor.test in R does.

It should go something like (please note I've never used Go):

import (
    "math"
    "gonum.org/v1/gonum/stat/distuv"
)

func twoSidedPValue(r float64, n float64) float64 {

    // compute the test stat
    ts := r * math.Sqrt((n - 2) / (1 - r*r))

    // make a Student's t with (n-2) d.f.
    t := distuv.StudentsT{0, 1, (n-2), nil}

    // compute the p-value
    pval := 2 * t.CDF(-math.Abs(testStat))

    return pval
}

Testing against R's cor.test seems to match up.

Permutation Test

If your sampled variables are not each normal, then you can use a permutation test. Essentially, randomize your data and see how many times a random correlation matches or exceeds your observed one. If your test is two-sided (i.e., you had no principled assumptions about the correlation outcome), use the absolute values of the correlation for the test.


Test Details

The Inference section of the Wikipedia entry, "Pearson correlation coefficient", has details.

merv
  • 67,214
  • 13
  • 180
  • 245
  • Thanks for very useful information. What would be the code to determine P value from Student's t distribution as implemented in GoNum for, say t=11.2 (which is obtained with n=100 and r=0.75) ? – rnso Sep 13 '19 at 17:30
  • This is exactly what I wanted. Thanks. For 1-sided P value (lesser or greater) are major changes in code required? – rnso Sep 19 '19 at 23:57