1

I'm plotting graphs in windows application in C#. I've used Least Squares Fitting--Perpendicular Offsets to find best fit line. But my data source varies from vertical line to (almost) horizontal one.

Then I read about PCA, and Accord.net libraries. I've written some code, but not getting what exactly to do further.

I have a list of DataPoints of a graph.

DataTable dt = new DataTable();
dt.Columns.Add("X", typeof(double));
dt.Columns.Add("Y", typeof(double));

foreach (DataPoint dp in listOfPoints)
{
    DataRow dr = dt.NewRow();
    dr["X"] = dp.XValue; 
    dr["Y"] = dp.YValues[0];
    dt.Rows.Add(dr);
}

string[] columnNames;
double[,] sourceMatrix = dt.ToMatrix(out columnNames);
DescriptiveAnalysis sda = new DescriptiveAnalysis(sourceMatrix, columnNames);
sda.Compute();
AnalysisMethod method = AnalysisMethod.Center;

PrincipalComponentAnalysis pca = new PrincipalComponentAnalysis(sda.Source, method);
pca.Compute();
double[] mean = sourceMatrix.Mean();
double[,] eigenVectors = pca.ComponentMatrix;

After getting eigen vectors how to utilize them in plotting best fit line.

KSK
  • 65
  • 1
  • 9

1 Answers1

4

Yes, PCA will find line with the smallest total squared distance from the data set.

PCA starts with calculating the covariance matrix. When you multiply any vector by this matrix, the magnitude of the result is the variance of the data set in the direction of the vector.

If you draw a line though the mean point of your data, the total squared distance of all the points from that line is the variance along the vector perpendicular to that line, so you want to find the line with the smallest perpendicular variance.

The covariance matrix is symmetric. What this means in visual terms is that it has two orthogonal eigenvectors, and if you move your axes to these eigenvectors, then it becomes a simple diagonal matrix.

The principal eigenvector of the covariance matrix is the direction of the largest variance in the data, and the other eigenvector is the direction of smallest variance. Since the eigenvectors are perpendicular, and the best fit line is perpendicular to the direction of smallest variance...

The principal eigenvector of the covariance matrix, which PCA finds, is the direction of the best fit line. Draw a line in that direction through the mean point and you're done.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Thanks @Matt, Will you please explain me in the accord.net and c# context. As this is more mathematical. – KSK Apr 26 '16 at 05:16
  • Ah, you did not ask what exactly to do :) Sorry, I'm not familiar with accord.net, so that is not a question I can answer, although I'm sure it will be very easy using the PCA class in Accord.Statistics. – Matt Timmermans Apr 26 '16 at 17:50
  • He probably meant to take the first column (maybe first row?) of the ```pca.ComponentMatrix``` property, and interpret it as mathematical vector. You can then create a line from this vector using the ```Line.FromPoints(mean, mean + vector)``` named constructor of the [Line class](http://accord-framework.net/docs/html/T_Accord_Math_Geometry_Line.htm). – Cesar Jul 08 '17 at 21:30