I need implementation of PCA in Java. I am interested in finding something that's well documented, practical and easy to use. Any recommendations?
5 Answers
There are now a number of Principal Component Analysis implementations for Java.
Apache Spark: https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html#principal-component-analysis-pca
SparkConf conf = new SparkConf().setAppName("PCAExample").setMaster("local"); try (JavaSparkContext sc = new JavaSparkContext(conf)) { //Create points as Spark Vectors List<Vector> vectors = Arrays.asList( Vectors.dense( -1.0, -1.0 ), Vectors.dense( -1.0, 1.0 ), Vectors.dense( 1.0, 1.0 )); //Create Spark MLLib RDD JavaRDD<Vector> distData = sc.parallelize(vectors); RDD<Vector> vectorRDD = distData.rdd(); //Execute PCA Projection to 2 dimensions PCA pca = new PCA(2); PCAModel pcaModel = pca.fit(vectorRDD); Matrix matrix = pcaModel.pc(); }
ND4J: https://javadoc.io/doc/org.nd4j/nd4j-api/latest/org/nd4j/linalg/dimensionalityreduction/PCA.html
//Create points as NDArray instances List<INDArray> ndArrays = Arrays.asList( new NDArray(new float [] {-1.0F, -1.0F}), new NDArray(new float [] {-1.0F, 1.0F}), new NDArray(new float [] {1.0F, 1.0F})); //Create matrix of points (rows are observations; columns are features) INDArray matrix = new NDArray(ndArrays, new int [] {3,2}); //Execute PCA - again to 2 dimensions INDArray factors = PCA.pca_factor(matrix, 2, false);
Apache Commons Math (single threaded; no framework)
//create points in a double array double[][] pointsArray = new double[][] { new double[] { -1.0, -1.0 }, new double[] { -1.0, 1.0 }, new double[] { 1.0, 1.0 } }; //create real matrix RealMatrix realMatrix = MatrixUtils.createRealMatrix(pointsArray); //create covariance matrix of points, then find eigenvectors //see https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues Covariance covariance = new Covariance(realMatrix); RealMatrix covarianceMatrix = covariance.getCovarianceMatrix(); EigenDecomposition ed = new EigenDecomposition(covarianceMatrix);
Note, Singular Value Decomposition, which can also be used to find Principal Components, has equivalent implementations.

- 185
- 2
- 9
-
1Remember to mean-center your data before PCA. Notice LotiLotiLoti did this implicitly in the example. – George Forman Sep 29 '19 at 17:54
Here is one: PCA Class.
This class contains the methods necessary for a basic Principal Component Analysis with a varimax rotation. Options are available for an analysis using either the covariance or the correlation martix. A parallel analysis, using Monte Carlo simulations, is performed. Extraction criteria based on eigenvalues greater than unity, greater than a Monte Carlo eigenvalue percentile or greater than the Monte Carlo eigenvalue means are available.

- 486,780
- 108
- 951
- 1,012
check http://weka.sourceforge.net/doc.stable/weka/attributeSelection/PrincipalComponents.html weka in fact have many other algorithm that could be used with along with PCA and also weka is adding more algorithm from time to time. so i thing, if you are working on java then switch to weka api.
-
4Invalid link, please try to avoid answering questions with only a link, as they can expire and be unreliable in the future. – Iancovici Dec 17 '13 at 13:12
Smile is a full-fledged ML library for java. You give its PCA implementation a try. Please see: https://haifengl.github.io/smile/api/java/smile/projection/PCA.html
There is also PCA tutorial with Smile but the tutorial uses Scala.

- 1,123
- 1
- 15
- 35
-
It's Apache 2.0 licensed. It appears to auto-center the data. And provides for projections. And switches to a SVD implementation when appropriate & permitted. https://haifengl.github.io/smile/feature.html#dimension-reduction – George Forman Oct 08 '19 at 20:42
You can see a few implementations of PCA in the DataMelt project:
https://jwork.org/dmelt/code/index.php?keyword=PCA
(they are rewritten in Jython). They include some graphical examples for dimensionality reduction. They show the usage of several Java packages, such as JSAT, DatumBox and others.

- 11
- 1