1

I'm attempting to add a component that will do regression analysis on data in our application. I wrote a service that uses MathNet Numerics to perform a regression to compute coefficients for various characteristics. I'm testing it by passing it the same data as a regression analysis in Excel. The data I'm using is below in CSV format:

Accuracy,Age,Boundary Lines,Distance,Unit,County,Similarity,Size,Quality
0.894556528,0.301,1,1,1,1,0.753,0.76,0.625
0.870692849,0.234,1,0.737,1,1,0.598,0.804,0.625
1.044444444,0.49,1,1,1,0,0.714,0.933,0.833
1.04120514,0.165,1,1,1,1,0.224,0.268,0.833
1.157635468,0.179,1,1,1,1,0.224,0.268,0.8
1.080750408,1,1,1,1,1,0.826,0.536,0.833
1.100041511,1,1,1,1,1,0.569,0.5,0.833
0.932113964,1,1,1,1,1,0.78,0.25,0.625
0.966800438,0.267,1,1,1,1,0.916,0.408,0.625
0.941051136,0.212,1,0.713,1,1,0.755,0.667,0.625
0.932835821,0.429,1,1,1,1,1,0.999,1
1.689189189,0.12,1,1,1,1,1,0.941,1
1.199451679,0.292,1,1,1,1,1,0.6,1
0.7,0.188,1,0.981,1,1,1,1,0.333
0.795454545,1,1,1,1,1,1,0.364,0.5

"Accuracy" is the dependent value, and I'm calculating coefficients for the other columns using them as independent values. When I do a Regression in Excel, I get these results:

enter image description here

When I test my data in code, I get the same values except for the Boundary Lines and Unit coefficients, while those are 0 in Excel, in my code they are -0.581 and 0.876, respectively.

I'm doing the calculations by creating a Matrix of the ratings columns and a Vector of the Accuracy column and using DirectRegressionMethod.NormalEquations.

Matrix<double> ratingsMatrix = DenseMatrix.OfRowVectors(ratingsVectors);
Vector<double> accuracyVector = DenseVector.OfEnumerable(accuracyRatings);
Vector<double> coefficientVector = MultipleRegression.DirectMethod(ratingsMatrix, accuracyVector, DirectRegressionMethod.NormalEquations);

I'm trying to figure out the reason for the difference. The results from Excel worked well for what I needed them for, so I'd like to be able to duplicate it. I suspect it could be from the fact that both of those columns are all 1s?

GSD
  • 1,252
  • 1
  • 10
  • 12
Valuator
  • 3,262
  • 2
  • 29
  • 53
  • 1
    Try changing format of answer in excel to show more decimal places. Excel cells default to General Format that does a lot of funny things. I believe the result you are getting from c# are accurate and do not believe the Excel results. – jdweng May 16 '19 at 23:55

0 Answers0