1

I have the following code and I'm trying with 2 different methods to solve a multiple regression equation. The arrays are too long to list but the count on each one matches at 5704 lines. I'm getting the following errors when I try to run the code

// getting An unhandled exception of type 'System.ArgumentException' occurred in MathNet.Numerics.dll

Additional information: Matrix must be positive definite.

double[] p = Fit.MultiDim(
                new[] { shortRatingList.ToArray(), mediumRatingList.ToArray(), longRatingList.ToArray() },
                weekReturnList.ToArray(),
                intercept: true);

// getting An unhandled exception of type 'System.ArgumentException' occurred in MathNet.Numerics.dll

Additional information: Matrix dimensions must agree: 3x5705.

                double[] q = MultipleRegression.QR(
                new[] { shortRatingList.ToArray(), mediumRatingList.ToArray(), longRatingList.ToArray() },
                weekReturnList.ToArray(),
                intercept: true);
TylerH
  • 20,799
  • 66
  • 75
  • 101
DarthVegan
  • 1,719
  • 7
  • 25
  • 42

1 Answers1

2

The code snippet in the question computes a regression with 3 sample points (each one of them with 5704 values), so it expects weekReturnList to be of length 3.

However, if weekReturnList has length 5704 as well and your data actually represents 5704 data points with 3 values each (short, medium, long) then you need to transpose the input.

I assume that data organized by columns instead of data points is quite common in practice, so we should consider to add a shortcut function for this use case in the Fit class as well.

In the meantime you could use the following, which transposes the input by creating the design matrix from column arrays instead of row arrays:

MultipleRegression.NormalEquations(
    Matrix<double>.Build.DenseOfColumnArrays(shortRatingList, mediumRatingList, longRatingList),
    Vector<double>.Build.Dense(weekReturnList));
Christoph Rüegg
  • 4,626
  • 1
  • 20
  • 34
  • This will generate the coefficient for each sample point? I think I'm using that term correctly – DarthVegan Jan 11 '15 at 16:14
  • This will find the 3 coefficients which minimize the error for the 5704 data samples such that p1*short + p2*medium + p3*long ~= weekReturn. Does that make any sense? You did not actually specify the model you want a regression for ;) – Christoph Rüegg Jan 11 '15 at 16:30
  • Just noticed that you asked for an intercept term (and thus 4 coefficients). The simplest way to do this with that function is to insert an all-one vector first (e.g. `Generate.Repeat(5704, 1.0)`), before the other 3. – Christoph Rüegg Jan 11 '15 at 16:32
  • I'm using multiple regression for a school project and to be honest I barely know what it is and how to do it. I'm much better with linear regression. Actually I only had the intercept term in there because of the example I saw. What my project is supposed to do is come up with 3 different ways to score a stock based on different data and then try to come up with an expected return. I hope I'm coming across clear – DarthVegan Jan 11 '15 at 16:33
  • Seems we're on the right way then, with `expectedWeekReturn = p1*shortRating + p2*mediumRating + p3*longRating` where the p-parameters are the 3 results returned by the NormalEquations function. – Christoph Rüegg Jan 11 '15 at 17:19
  • I know I already marked it as answered but is there a way to see the error or calculate the error amount from the multiple regression? – DarthVegan Jan 13 '15 at 16:08
  • GoodnessOfFit.RSquared may help to get an indication of the error, e.g. `GoodnessOfFit.RSquared( Enumerable.Range(0,weekReturnList.Length).Select(i => f(shortRatingList[i], mediumRatingList[i], longRatingList[i])), weekReturnList);` with `Func f = (rs,rm,rl) => p[0]*rs + p[1]*rm + p[2]*rl`. See also See also http://stackoverflow.com/questions/27940868/linear-fit-with-math-net-error-in-data-and-error-in-fit-parameters – Christoph Rüegg Jan 17 '15 at 17:49