9

I have 2 arrays of equal length. The following function attempts to calculate the slope using these arrays. It returns the average of the slope between each points. For the following data set, I seem to be getting different values than Excel and Google Docs.

        double[] x_values = { 1932, 1936, 1948, 1952, 1956, 1960, 1964, 1968,
            1972, 1976, 1980 };
    double[] y_values = { 197, 203, 198, 204, 212, 216, 218, 224, 223, 225,
            236 };



public static double getSlope(double[] x_values, double[] y_values)
        throws Exception {

    if (x_values.length != y_values.length)
        throw new Exception();

    double slope = 0;

    for (int i = 0; i < (x_values.length - 1); i++) {
        double y_2 = y_values[i + 1];
        double y_1 = y_values[i];

        double delta_y = y_2 - y_1;

        double x_2 = x_values[i + 1];
        double x_1 = x_values[i];

        double delta_x = x_2 - x_1;

        slope += delta_y / delta_x;
    }

    System.out.println(x_values.length);
    return slope / (x_values.length);
}

Output

Google: 0.755

getSlope(): 0.962121212121212

Excel: 0.7501

Community
  • 1
  • 1
Nyx
  • 1,273
  • 6
  • 19
  • 32
  • See the numerical example [here](http://en.wikipedia.org/wiki/Simple_linear_regression) on calculation. This should be trivial to code. – karmanaut Mar 15 '13 at 12:27

4 Answers4

5

I bet the other two methods are computing the least-squares fit, whereas you are not.

When I verify this conjecture using R, I too get the slope of about 0.755:

> summary(lm(y~x))

Call:
lm(formula = y ~ x)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.265e+03  1.793e+02  -7.053 5.97e-05 ***
x            7.551e-01  9.155e-02   8.247 1.73e-05 ***

The relevant number is the 7.551e-01. It is also worth noting that the line has an intercept of about -1265.

Here is a picture of the least-squares fit:

lm fit

As to implementing this in your code, see Compute least squares using java

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

This function will not help you much, as it does not take into account the breadths of the various line segments. Consider the differences in applying it to the points (0,0), (1000,1000), and (1001, 2000) versus (0,0), (1,1), and (2, 1001). Both cases have successive slopes 1 and 1000, yet they look greatly different.

You need to implement the method of least squares: http://en.wikipedia.org/wiki/Least_squares to find the line that best approximates your data set.

One more piece of advice: never throw a java.lang.Exception. Always choose a more-specific exception, even if you must write the class yourself. People using your code will need to handle java.lang.Exception, which interferes badly with their other code.

Eric Jablow
  • 7,874
  • 2
  • 22
  • 29
  • The Least Squares method is just part of a broader class of solutions to this problem: http://en.wikipedia.org/wiki/Linear_regression – roim Mar 15 '13 at 18:06
0

Edit: use Apache Commons Math class SimpleRegression if that's an option. Else, here's a method that calculates slope and also intercept, should yield the same results as excel and apache:

private static double intercept(List<Double> yList, List<Double> xList) {
    if (yList.size() != xList.size())
        throw new IllegalArgumentException("Number of y and x must be the same");
    if (yList.size() < 2)
        throw new IllegalArgumentException("Need at least 2 y, x");

    double yAvg = average(yList);
    double xAvg = average(xList);

    double sumNumerator = 0d;
    double sumDenominator = 0d;
    for (int i = 0; i < yList.size(); i++) {
        double y = yList.get(i);
        double x = xList.get(i);
        double yDiff = y - yAvg;
        double xDiff = x - xAvg;
        double numerator = xDiff * yDiff;
        double denominator = xDiff * xDiff;
        sumNumerator += numerator;
        sumDenominator += denominator;
    }

    double slope = sumNumerator / sumDenominator;
    double intercept = yAvg - (slope * xAvg);
    return intercept;
}

private static double average(Collection<Double> doubles) {
    return doubles.stream().collect(Collectors.averagingDouble(d -> d));
}

Sources: Excel doc for SLOPE Excel doc for INTERCEPT

Manuel
  • 649
  • 8
  • 13
-1

You should be dividing by x_values.length - 1 . Number of slopes is pairwise.

Edit : Wiki example in my comments shows how to calculate the alpha and beta which determines the slope of the linear regression line.

karmanaut
  • 628
  • 1
  • 6
  • 17
  • 1
    The output of x_values.length is 11. Subtracting by 1 would give a higher average slope. – Nyx Mar 15 '13 at 12:06
  • Are you sure you are applying the right average logic in Excel/Google Docs? Could you post the macro? – karmanaut Mar 15 '13 at 12:10
  • `SLOPE(B2:B22, A2:A22)` Here, the B column contains `y_values` and the A column contains `x_values`. – Nyx Mar 15 '13 at 12:12
  • Cool. Give me a few minutes to check this out. – karmanaut Mar 15 '13 at 12:14
  • SLOPE(Y,X) calculates the slope of the linear regression line. A very quick [wiki](http://en.wikipedia.org/wiki/Simple_linear_regression) search shows that the way you calculate slope in java is not the way it should be calculated. – karmanaut Mar 15 '13 at 12:19