16

I'm trying to develop an application that would compute the same trend lines that excel does, but for larger datasets.

enter image description here

But I'm not able to find any java library that calculates such regressions. For the linera model I'm using Apache Commons math, and for the other there was a great numerical library from Michael Thomas Flanagan but since january it is no longer available:

http://www.ee.ucl.ac.uk/~mflanaga/java/

Do you know any other libraries, code repositories to calculate these regressions in java. Best,

Fgblanch
  • 5,195
  • 8
  • 37
  • 51
  • Why not roll out your own? At least math is sorta easy to code, right? IOW `What have you tried?` – Shark Jul 11 '13 at 11:32

3 Answers3

42

Since they're all based on linear fits, OLSMultipleLinearRegression is all you need for linear, polynomial, exponential, logarithmic, and power trend lines.

Your question gave me an excuse to download and play with the commons math regression tools, and I put together some trend line tools:

An interface:

public interface TrendLine {
    public void setValues(double[] y, double[] x); // y ~ f(x)
    public double predict(double x); // get a predicted y for a given x
}

An abstract class for regression-based trendlines:

public abstract class OLSTrendLine implements TrendLine {

    RealMatrix coef = null; // will hold prediction coefs once we get values

    protected abstract double[] xVector(double x); // create vector of values from x
    protected abstract boolean logY(); // set true to predict log of y (note: y must be positive)

    @Override
    public void setValues(double[] y, double[] x) {
        if (x.length != y.length) {
            throw new IllegalArgumentException(String.format("The numbers of y and x values must be equal (%d != %d)",y.length,x.length));
        }
        double[][] xData = new double[x.length][]; 
        for (int i = 0; i < x.length; i++) {
            // the implementation determines how to produce a vector of predictors from a single x
            xData[i] = xVector(x[i]);
        }
        if(logY()) { // in some models we are predicting ln y, so we replace each y with ln y
            y = Arrays.copyOf(y, y.length); // user might not be finished with the array we were given
            for (int i = 0; i < x.length; i++) {
                y[i] = Math.log(y[i]);
            }
        }
        OLSMultipleLinearRegression ols = new OLSMultipleLinearRegression();
        ols.setNoIntercept(true); // let the implementation include a constant in xVector if desired
        ols.newSampleData(y, xData); // provide the data to the model
        coef = MatrixUtils.createColumnRealMatrix(ols.estimateRegressionParameters()); // get our coefs
    }

    @Override
    public double predict(double x) {
        double yhat = coef.preMultiply(xVector(x))[0]; // apply coefs to xVector
        if (logY()) yhat = (Math.exp(yhat)); // if we predicted ln y, we still need to get y
        return yhat;
    }
}

An implementation for polynomial or linear models:

(For linear models, just set the degree to 1 when calling the constructor.)

public class PolyTrendLine extends OLSTrendLine {
    final int degree;
    public PolyTrendLine(int degree) {
        if (degree < 0) throw new IllegalArgumentException("The degree of the polynomial must not be negative");
        this.degree = degree;
    }
    protected double[] xVector(double x) { // {1, x, x*x, x*x*x, ...}
        double[] poly = new double[degree+1];
        double xi=1;
        for(int i=0; i<=degree; i++) {
            poly[i]=xi;
            xi*=x;
        }
        return poly;
    }
    @Override
    protected boolean logY() {return false;}
}

Exponential and power models are even easier:

(note: we're predicting log y now -- that's important. Both of these are only suitable for positive y)

public class ExpTrendLine extends OLSTrendLine {
    @Override
    protected double[] xVector(double x) {
        return new double[]{1,x};
    }

    @Override
    protected boolean logY() {return true;}
}

and

public class PowerTrendLine extends OLSTrendLine {
    @Override
    protected double[] xVector(double x) {
        return new double[]{1,Math.log(x)};
    }

    @Override
    protected boolean logY() {return true;}

}

And a log model:

(Which takes the log of x but predicts y, not ln y)

public class LogTrendLine extends OLSTrendLine {
    @Override
    protected double[] xVector(double x) {
        return new double[]{1,Math.log(x)};
    }

    @Override
    protected boolean logY() {return false;}
}

And you can use it like this:

public static void main(String[] args) {
    TrendLine t = new PolyTrendLine(2);
    Random rand = new Random();
    double[] x = new double[1000*1000];
    double[] err = new double[x.length];
    double[] y = new double[x.length];
    for (int i=0; i<x.length; i++) { x[i] = 1000*rand.nextDouble(); }
    for (int i=0; i<x.length; i++) { err[i] = 100*rand.nextGaussian(); } 
    for (int i=0; i<x.length; i++) { y[i] = x[i]*x[i]+err[i]; } // quadratic model
    t.setValues(y,x);
    System.out.println(t.predict(12)); // when x=12, y should be... , eg 143.61380202745192
}

Since you just wanted trend lines, I dismissed the ols models when I was done with them, but you might want to keep some data on goodness of fit, etc.

For implementations using moving average, moving median, etc, it looks like you can stick with commons math. Try DescriptiveStatistics and specify a window. You might want to do some smoothing, using interpolation as suggested in another answer.

maybeWeCouldStealAVan
  • 15,492
  • 2
  • 23
  • 32
  • Oops - switched the power and exp methods. Fixed now. Also added a logarithmic model. – maybeWeCouldStealAVan Jul 14 '13 at 00:04
  • I have tried your code it good but I just wanted to know how to draw a trend Line once if I get array of Y. Another thing is that, is it only working for y= x2 + const equation or I can able to chage it to my equation say y=4x2+3x+6.4. – Nitish Patel Apr 02 '14 at 11:49
  • Yes, a `TrendLine` created with `PolyTrendLine(2)` will estimate coefficients for `y=b0+b1*x+b2*x^2`. Look closely a `xVector` and you'll see that it uses x^0, x^1, ... x^k as estimators for a k-degree regression. – maybeWeCouldStealAVan Apr 03 '14 at 02:46
  • by simply calling main() it will draw line on graph? I have try piece of code in my onCreate() but it will not draw anything on my UI. Please see my question http://stackoverflow.com/questions/22808204/how-to-draw-trade-line-on-scatter-chart-in-android – Nitish Patel Apr 03 '14 at 06:22
  • maybeWeCouldStealAVan How could you get the absolute value OR square of r? | r | or r^2 – Dickey Singh Aug 28 '15 at 00:22
  • @maybeWeCouldStealAVan thank you for providing this good answer. Does apache commons support multivariate regression? The problem I'm working on requires having 2 dependent variables, so is this possible through this library? – Dania Feb 22 '16 at 17:39
  • And what it will be calculated? The data set [81, 61, 91, 99, 124, 133, 173, 147, 105 ] returns huge unpredictable results for polynominal trend -> [ 6561.337862157522, 3721.3825048299277, 8281.316269265779, 9801.299344605724, 15376.248458265314, 17689.23088219612, 29929.157525503473, 21609.20432350775, 11025.286855075132] How to use it? How to calculcate logorithmic trendline? – Eugene Shamkin Dec 10 '19 at 07:23
4

You can use different kinds of interpolators available in org.apache.commons.math3.analysis.interpolation, including e.g., LinearInterpolator, LoessInterpolator and NevilleInterpolator.

Alexander Serebrenik
  • 3,567
  • 2
  • 16
  • 32
  • FYI. Update link. Math4. https://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/analysis/interpolation/package-tree.html – Wpigott Jun 26 '22 at 14:51
4

In addition to what maybeWeCouldStealAVa said;

The commons-math3 library is also available in the maven repository.

Current version is 3.2 and the dependency tag is:

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-math3</artifactId>
        <version>3.2</version>
    </dependency>
rudolph9
  • 8,021
  • 9
  • 50
  • 80