0

I have problems with this. I am trying to do a linear regression and test the slope. The t-test checks if the slope is far away from 0. The slope can be negative or positive. I am only interested in negative slopes.

In this example, the slope is positive which I am not interested in, so the P value should be large. But it is small because right now it tests if the slope is far away from 0, in either direction. (I am forcing an intercept of zero, which is what I want). Can someone help me with the syntax to see if the slope is only negative. In this case the P value should be large.

And how can I change to, to say 99% confidence level or 95% or...?

import statsmodels.api as sm
import matplotlib.pyplot as plt
import numpy
X = [-0.013459134, 0.01551033, 0.007354476, 0.014686473, -0.014274754, 0.007728445, -0.003034186, -0.007409397]
Y = [-0.010202462, 0.003297546, -0.001406498, 0.004377665, -0.009244517, 0.002136552, 0.006877126, -0.001494624]
regression_results = sm.OLS (Y, X, missing = "drop").fit ()
P_value = regression_results.pvalues [0]
R_squared = regression_results.rsquared
K_slope = regression_results.params [0]
conf_int = regression_results.conf_int ()
low_conf_int = conf_int [0][0]
high_conf_int = conf_int [0][1]
fig, ax = plt.subplots ()
ax.grid (True)
ax.scatter (X, Y, alpha = 1, color='orchid')
x_pred = numpy.linspace (min (X), max (X), 40)
y_pred = regression_results.predict (x_pred)
ax.plot (x_pred, y_pred, '-', color='darkorchid', linewidth=2)
Orvar Korvar
  • 935
  • 2
  • 12
  • 18
  • I'm voting to close this question as off-topic because this isn't a Python question, but a stats one. That belongs on [CrossValidated](http://stats.stackexchange.com/) – MSalters Apr 13 '17 at 11:07
  • 3
    It's a question about programming. statsmodels currently does not support one-sided t_test in model results. – Josef Apr 13 '17 at 11:35

1 Answers1

4

p-value for the two-way t-test is calculated by:

import scipy.stats as ss
df = regression_results.df_resid
ss.t.sf(regression_results.tvalues[0], df) * 2 # About the same as (1 - cdf) * 2.
# see @user333700's comment
Out[12]: 0.02903685649821508

Your modification would just be:

ss.t.cdf(regression_results.tvalues[0], df)
Out[14]: 0.98548157175089246

since you are interested in the left-tail only.

For confidence interval, you just need to pass the alpha parameter:

regression_results.conf_int(alpha=0.01)

for a 99% confidence interval.

ayhan
  • 70,170
  • 20
  • 182
  • 203
  • 3
    It is better to use t.sf instead of 1-cdf because it has more precision in the right tail, and can be smaller than float epsilon. – Josef Apr 13 '17 at 11:33
  • 1
    Great!!!! You are my savior!!! :) ........ A question, if I am interested in the right tail, how do I do that? And what is left tail, is it negative slope? And right tail is positive slope? ....... Just to confirm, "ss.t.cdf(regression_results.tvalues[0],df)" gives me the P value for negative slope? ....... alpha=0.1 is 10% confidence level for one sided test?......how is the syntax for t.sf? – Orvar Korvar Apr 13 '17 at 11:52
  • 1
    @user333700 Thank you, I didn't know that. Let me update. – ayhan Apr 13 '17 at 12:41
  • 1
    @OrvarKorvar Yes, p value for the one sided test. For the right tail, you would change it to ` 1 - ss.t.cdf` But as user333700 noted, you can replace that with the survival function (e.g. `ss.t.sf(regression_results.tvalues[0], df)`) – ayhan Apr 13 '17 at 12:44
  • 1
    @OrvarKorvar p-value calculation does not depend on the alpha value. When you compare p with an alpha, then you use it. So p-value is the same for alpha=0.05 and alpha=0.1. As for the confidence interval, that's always two tailed (you can just look at the lower bound though). – ayhan Apr 13 '17 at 12:48
  • When I do "ss.t.sf(...)" I get another result than "ss.t.cdf(...)". The difference is very large, 0.05 instead of 0.98. Why is that? PS. Bana kizartma ver! – Orvar Korvar Apr 13 '17 at 13:49
  • `ss.t.sf(...)` is equivalent of `1 - ss.t.cdf(...)`. For the right tail. You want fries? :) – ayhan Apr 13 '17 at 14:02
  • User333700 wrote above that statsmodels does not support one sided t tests. Do you know another library that supports it? I would prefer a simple library with everything included, that does not use cumbersome workarounds. Which library would you use? Tabi, bana domates, patlican, salatalik ver! Gel, gel dalga gecelem. (My turkish is really bad, if you could not tell by now) – Orvar Korvar Apr 17 '17 at 08:35
  • @OrvarKorvar As far as I know, statsmodels is the most comprehensive package for inferential statistical analysis in Python. You might get a better support in R though. Türkçen gayet iyi, sadece emin olamamıştım. :) – ayhan Apr 19 '17 at 02:23