I am comparing two linear models in R with Anova, and I would like to do the same thing in Java. To simplify it, I took the example code from https://stats.stackexchange.com/questions/48854/why-am-i-getting-different-intercept-values-in-r-and-java-for-simple-linear-regr and modified it a bit below. The models are test_trait ~ geno_A + geno_B
and test_trait ~ geno_A + geno_B + geno_A:geno_B
. The coefficients of the models implemented in R and Java are the same. In R I use anova(fit, fit2)
where the fits are the results of lm and in Java I use TestUtils.oneWayAnovaPValue
from org.apache.commons.math3
.
With R I get a pvalue of 0.797
, while with Java I get a pvalue of 0.817
, so this is not the right method, but I can not find how to do it correctly. Is there an equivalent of R's anova.lm
in Java?
The full code is below.
R
test_trait <- c( -0.48812477 , 0.33458213, -0.52754476, -0.79863471, -0.68544309, -0.12970239, 0.02355622, -0.31890850,0.34725819 , 0.08108851)
geno_A <- c(1, 0, 1, 2, 0, 0, 1, 0, 1, 0)
geno_B <- c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0)
fit <- lm(test_trait ~ geno_A+geno_B)
fit2 <- lm(test_trait ~ geno_A + geno_B + geno_A:geno_B)
which gives the coefficients
> fit
Call:
lm(formula = test_trait ~ geno_A + geno_B)
Coefficients:
(Intercept) geno_A geno_B
-0.03233 -0.10479 -0.60492
> fit2
Call:
lm(formula = test_trait ~ geno_A + geno_B + geno_A:geno_B)
Coefficients:
(Intercept) geno_A geno_B geno_A:geno_B
-0.008235 -0.152979 -0.677208 0.096383
And the Anova
> anova(fit, fit2) # 0.797
Analysis of Variance Table
Model 1: test_trait ~ geno_A + geno_B
Model 2: test_trait ~ geno_A + geno_B + geno_A:geno_B
Res.Df RSS Df Sum of Sq F Pr(>F)
1 7 0.77982
2 6 0.77053 1 0.0092897 0.0723 0.797
Java
double [] y = {-0.48812477, 0.33458213,
-0.52754476, -0.79863471,
-0.68544309, -0.12970239,
0.02355622, -0.31890850,
0.34725819, 0.08108851};
double [][] x = {{1,0}, {0,0},
{1,0}, {2,1},
{0,1}, {0,0},
{1,0}, {0,0},
{1,0}, {0,0}};
double [][] xb = {{1,0,0}, {0,0,0},
{1,0,0}, {2,1,2},
{0,1,0}, {0,0,0},
{1,0,0}, {0,0,0},
{1,0,0}, {0,0,0}};
OLSMultipleLinearRegression regr = new OLSMultipleLinearRegression();
regr.newSampleData(y, x);
double[] beta = regr.estimateRegressionParameters();
System.out.printf("First model: y = int + genoA + genoB\n");
System.out.printf("Intercept: %.3f\t", beta[0]);
System.out.printf("beta1: %.3f\t", beta[1]);
System.out.printf("beta2: %.3f\n\n", beta[2]);
regr.newSampleData(y, xb);
double[] betab = regr.estimateRegressionParameters();
System.out.printf("Second model: y = int + genoA + genoB + genoA:genoB\n");
System.out.printf("Intercept: %.3f\t", betab[0]);
System.out.printf("beta1: %.3f\t", betab[1]);
System.out.printf("beta2: %.3f\t", betab[2]);
System.out.printf("beta2: %.3f\n", betab[3]);
Which gives the same coefficients as in R
First model: y = int + genoA + genoB
Intercept: -0.032 beta1: -0.105 beta2: -0.605
Second model: y = int + genoA + genoB + genoA:genoB
Intercept: -0.008 beta1: -0.153 beta2: -0.677 beta2: 0.096
But the Anova gives a different result
List classes = new ArrayList();
classes.add(beta);
classes.add(betab);
double pvalue = TestUtils.oneWayAnovaPValue(classes);
double fvalue = TestUtils.oneWayAnovaFValue(classes);
System.out.println(pvalue);
System.out.println(fvalue);
0.8165390406874127
0.05979444576790511