0

Possible Duplicate:
Is there a Java library for better linear regression? (E.g., iteratively reweighted least squares)

I have a following code inf R, but i need to implement the same thing in java. I am not very sharp at maths, so need some help.

test_trait <- c( -0.48812477 , 0.33458213, -0.52754476, -0.79863471, -0.68544309, -0.12970239,  0.02355622, -0.31890850,0.34725819 , 0.08108851)

geno_A <- as.factor(c("Sub_0001"=1, "Sub_0002"=0, "Sub_0003"=1, "Sub_0004"=2, "Sub_0005"=0, "Sub_0006"=0, "Sub_0007"=1, "Sub_0008"=0, "Sub_0009"=1, "Sub_0010"=0))

geno_B <- as.factor(c("Sub_0001"=0, "Sub_0002"=0, "Sub_0003"=0, "Sub_0004"=1, "Sub_0005"=1, "Sub_0006"=0, "Sub_0007"=0, "Sub_0008"=0, "Sub_0009"=0, "Sub_0010"=0) )

fit <- lm(test_trait ~ geno_A*geno_B)
res <- anova(fit)
p.value <- res[3,5]

Edit 1: I had checked Apache Commons Math library before posting this question and checked Is there a Java library for better linear regression? (E.g., iteratively reweighted least squares) , but my problem is that i could not identify if my case is simple linear regression or multiple.

test_trait contains height expressed from genetic trait geno_A and geno_B. geno_A and geno_B are alleles.

Community
  • 1
  • 1
World
  • 2,007
  • 7
  • 27
  • 37
  • 1
    try existing [answer](http://stackoverflow.com/questions/8406305/is-there-a-java-library-for-better-linear-regression-e-g-iteratively-reweigh) – TheWhiteRabbit Jan 29 '13 at 09:54
  • i cound not figure out whether this is simple linear regression or multiple. – World Jan 29 '13 at 11:11

2 Answers2

2

Googling for java linear regression lead me to a number of interesting links, among which this SO question:

Is there a Java library for better linear regression? (E.g., iteratively reweighted least squares)

Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
0

Linear regression y = a + b*x can be computed using the following equations:

b = (n*Σ(X*Y) - (ΣX)*(ΣY)) / (n*Σ(X^2) - (ΣX)^2)
a = (ΣY - b*(ΣX)) / n

Here Σ(A) is the sum of all available values of A, and n - the number of these values (the number of X,Y pairs).

Own implementation may be preferred if the regression needs to be done on your custom data structures, directly. A library would require to pass the data structures it supports, potentially requiring either to clone a lot of data or design the data structures in otherwise suboptimal way.

From the other side, if the amount of data is not large or double[] is just good enough structure for your project, SimpleRegression from Apache Commons is probably appropriate for the most usual cases.

Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93
  • There are probably several Java libraries to do this, so probably there is no need to reinvent the wheel. The SO post I linked already mentions some options. – Paul Hiemstra Jan 29 '13 at 10:10
  • After finding a library to compute the regression, one may ask for the library to compute average. There is certain threshold of complexity for justifying the library usage, especially if that library is written in another language. – Audrius Meškauskas Jan 29 '13 at 10:19
  • Linear regression is certainly library-worthy, it would be a waste of time to re-implement something that has already been programmed. That same library probably also has a function to calculate the mean. In addition, the libraries I link below are Java libaries, not R. Learning how to connect Java and R is in itself interesting, as there are much more complex libraries available in R. – Paul Hiemstra Jan 29 '13 at 10:22
  • 1
    [SimpleRegression](http://commons.apache.org/math/apidocs/org/apache/commons/math3/stat/regression/SimpleRegression.html) from Apache Commons probably could be used for cases that do not require very custom approach. – Audrius Meškauskas Jan 29 '13 at 10:28