0

I am using python 3.6 to run some statistics test on a data-set. What I am trying to accomplish is to run a t-test between the data-set and the trend line to determine the statistical significance. I and using scipy to do this however I am not sure what variables I should include in the test to get the outcome I need.

Here is my code so far:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

p = np.load('data.npy')

#0=1901
start=0
end=100

plt.figure()
plt.plot(a,annualmean,  '-')
slope, intercept, r_value, p_value, std_err = stats.linregress(a,annualmean)
plt.plot(a,intercept+slope*a, 'r')

annualmean=[]
for n in range(start,end):
    annualmean.append(np.nanmean(p[n]))

#Trendline Plots
a=range(start,end)
year1 = 1901

print(stats.ttest_ind(annualmean,a))

Right now the code is working, no error messages, however I am getting an incredibly small p-value that I don't think is correct. If anyone knows knows what variables I should write into the t-test that would be very helpful. Thanks!

CPG
  • 97
  • 2
  • 15

2 Answers2

1

I don't have the reputation to comment, but according to your code, you are doing a t-test comparing the means between the annual mean data and an array from 0-100. scipy.stats.ttest takes two arrays of equal size for which you want to compare the mean.

According to the documentation:

scipy.stats.ttest_ind(a, b, axis=0, equal_var=True)[source]

Parameters: 
a, b : array_like
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).

An additional note, it doesn't make sense to do a t-test between a trend line and your raw data but that is a question for another forum

Tkanno
  • 656
  • 6
  • 14
  • 1
    Note, the two arrays don't need to have the same length for ttest_ind. See the except clause in the docstring. – Josef Jun 24 '17 at 09:53
0

So turns out I was confused about how to test the statistical significance. I already had figured out a p-value for the data in the line:

slope, intercept, r_value, p_value, std_err = stats.linregress(a,annualmean)

All I needed to do was: print(p_value)

CPG
  • 97
  • 2
  • 15