4

With the following code taken from examples... How do I get the p-value and t-stat that you would find in output such as Excel?

  OLSMultipleLinearRegression regression2 = new OLSMultipleLinearRegression();
  double[] y = { 4, 8, 13, 18};
  double[][] x = {{ 1, 1, 1  },
                  { 1, 2, 4  },
                  { 1, 3, 9  },
                  { 1, 4, 16  }};

  regression2.newSampleData(y, x);
  regression2.setNoIntercept(true);
  double[] beta = regression2.estimateRegressionParameters();

  for (double d : beta) {
     System.out.println("D: " + d);
  }

After posting this question I solved the t-stat part:

  for (int i=0; i < beta.length; i++){
     double tstat = beta[i] / regression.estimateRegressionParametersStandardErrors()[i];
     System.out.println("t-stats(" +i +") : " +tstat );
  }
Mike Samaras
  • 376
  • 2
  • 13
  • Are you sure that that second code block is correct with regards to calculating t-stat? I just ran the equivalent for myself and got ridiculously high values. – Peter Kazazes Feb 01 '16 at 17:13
  • Check my edit on my answer below. It does match excel. Your tstat is ofcourse highly dependant on the quality of your regression. – Mike Samaras Feb 11 '16 at 19:50

1 Answers1

5
  int residualdf = regression.estimateResiduals().length-beta.length;
  for (int i=0; i < beta.length; i++){
     double tstat = beta[i] / regression.estimateRegressionParametersStandardErrors()[i];

     double pvalue = new TDistribution(residualdf).cumulativeProbability(-FastMath.abs(tstat))*2;

     System.out.println("p-value(" +i +") : " +pvalue );
  }

This will give you the p-values. It's not optimized in anyway but the values match excel perfectly.

I've updated my code to the below to address comments.. It matches Excel.

      final double[] beta = regression.estimateRegressionParameters();
  final double[] standardErrors = regression.estimateRegressionParametersStandardErrors();
  final int residualdf = regression.estimateResiduals().length - beta.length;

  final TDistribution tdistribution = new TDistribution(residualdf);

  //calculate p-value and create coefficient
  final Map<RegressionCoefficientNames, RegressionCoefficient> coefficientMap = new HashMap<>(beta.length);
  for (int i = 0; i < beta.length; i++)
  {
     double tstat = beta[i] / standardErrors[i];
     double pvalue = tdistribution.cumulativeProbability(-FastMath.abs(tstat)) * 2;
     final RegressionCoefficient coefficient = new RegressionCoefficient(extensionModelType.getNameByIndex(i),
                                                                         beta[i],
                                                                         standardErrors[i],
                                                                         tstat,
                                                                         pvalue);

     coefficientMap.put(extensionModelType.getNameByIndex(i), coefficient);
  }

Here's improved code. I am matching

class RegressionCoefficient {
    private final RegressionCoefficientNames valueName;
    private final Double coefficient;
    private final Double standardError;
    private final Double tStat;
    private final Double pValue;
}
Mike Samaras
  • 376
  • 2
  • 13
  • where is RegressionCoefficientNames, RegressionCoefficient located? – Yeahia2508 Sep 05 '17 at 17:59
  • public enum RegressionCoefficientNames { INTERCEPT, PROXY_PRICE, PROXY_PRICE_SQUARED, TIME_TO_MATURITY, JANUARY, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER, NOVEMBER //DECEMBER //December is not used. } – Mike Samaras Sep 06 '17 at 12:31