Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Overview

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

Tag usage

Questions on tag pca should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software r for statistical computing and graphics, functions princomp and prcomp compute PCA.

2728 questions

votes

1 answer

How to convert spark DataFrame to RDD mllib LabeledPoints?

I tried to apply PCA to my data and then apply RandomForest to the transformed data. However, PCA.transform(data) gave me a DataFrame but I need a mllib LabeledPoints to feed my RandomForest. How can I do that? My code: import…

scala apache-spark rdd pca apache-spark-mllib

asked Mar 13 '16 at 05:35

Tianyi Wang

votes

2 answers

Incremental PCA on big data

I just tried using the IncrementalPCA from sklearn.decomposition, but it threw a MemoryError just like the PCA and RandomizedPCA before. My problem is, that the matrix I am trying to load is too big to fit into RAM. Right now it is stored in an hdf5…

python scikit-learn bigdata hdf5 pca

asked Jul 15 '15 at 11:00

KrawallKurt

votes

2 answers

Scikit-Learn PCA

I am using input data from here (see Section 3.1). I am trying to reproduce their covariance matrix, eigenvalues, and eigenvectors using scikit-learn. However, I am unable to reproduce the results as presented in the data source. I've also seen this…

scikit-learn statistics linear-algebra pca

asked Dec 30 '14 at 04:21

slaw

6,591
16
56
109

votes

5 answers

How to implement ZCA Whitening? Python

Im trying to implement ZCA whitening and found some articles to do it, but they are a bit confusing.. can someone shine a light for me? Any tip or help is appreciated! Here is the articles i read…

python pca correlated image-preprocessing

asked Jul 21 '15 at 01:14

user2136049

votes

1 answer

scikit-learn TruncatedSVD's explained variance ratio not in descending order

The TruncatedSVD's explained variance ratio is not in descending order, unlike sklearn's PCA. I looked at the source code and it seems they use different way of calculating the explained variance ratio: TruncatedSVD: U, Sigma, VT = randomized_svd(X,…

python scikit-learn pca svd

asked Feb 09 '16 at 18:06

Xiangyu

votes

1 answer

What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?

Suppose there is a matrix B, where its size is a 500*1000 double(Here, 500 represents the number of observations and 1000 represents the number of features). sigma is the covariance matrix of B, and D is a diagonal matrix whose diagonal elements are…

matlab machine-learning pca data-analysis

asked Aug 07 '15 at 19:08

Shawn

votes

1 answer

Sklearn.KMeans() : Get class centroid labels and reference to a dataset

Sci-Kit learn Kmeans and PCA dimensionality reduction I have a dataset, 2M rows by 7 columns, with different measurements of home power consumption with a date for each…

python date svm k-means pca

asked Dec 16 '14 at 12:38

flow

votes

2 answers

Hotelling's T^2 scores in python

I applied pca on a data set using matplotlib in python. However, matplotlib does not provide a t-squared scores like Matlab. Is there a way to compute Hotelling's T^2 score like Matlab? Thanks.

python matplotlib statistics scipy pca

asked Aug 20 '14 at 19:29

YC.Chui

votes

6 answers

Principal Component Analysis (PCA) on huge sparse dataset

I have about 1000 vectors x_i of dimension 50000, but they are very sparse; each has only about 50-100 nonzero elements. I want to do PCA on this dataset (in MATLAB) to reduce the unneeded extreme dimensionality of the data. Unfortunately, I don't…

matlab machine-learning pca sparse-matrix

asked Nov 16 '12 at 23:32

Sean

3,002
1
26
32

votes

5 answers

PCA Implementation in Java

I need implementation of PCA in Java. I am interested in finding something that's well documented, practical and easy to use. Any recommendations?

java pca

asked May 15 '12 at 15:57

Trup

1,635
13
27
40

votes

4 answers

How to whiten matrix in PCA

I'm working with Python and I've implemented the PCA using this tutorial. Everything works great, I got the Covariance I did a successful transform, brought it make to the original dimensions not problem. But how do I perform whitening? I tried…

python pca scikits

asked Jul 04 '11 at 18:09

mabounassif

2,311
6
29
46

votes

2 answers

pca.inverse_transform in sklearn

after fitting my data into X = my data pca = PCA(n_components=1) pca.fit(X) X_pca = pca.fit_transform(X) now X_pca has one dimension. When I perform inverse transformation by definition isn't it supposed to return to original data, that is X, 2-D…

python scikit-learn pca

asked Apr 05 '19 at 10:12

haneulkim

4,406
9
38
80

votes

1 answer

Principal component analysis (PCA) of time series data: spatial and temporal pattern

Suppose I have yearly precipitation data for 100 stations from 1951 to 1980. In some papers, I find people apply PCA to the time series and then plot the spatial loadings map (with values from -1 to 1), and also plot the time series of the PCs. For …

r spatial pca temporal

asked Dec 07 '16 at 16:45

Yang Yang

votes

2 answers

Python PCA on Matrix too large to fit into memory

I have a csv that is 100,000 rows x 27,000 columns that I am trying to do PCA on to produce a 100,000 rows X 300 columns matrix. The csv is 9GB large. Here is currently what I'm doing: from sklearn.decomposition import PCA as RandomizedPCA import…

python pandas machine-learning scikit-learn pca

asked Aug 24 '15 at 20:30

mt88

2,855
8
24
42

votes

1 answer

PCA Analysis in PySpark

Looking at http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html. The examples seem to only contain Java and Scala. Does Spark MLlib support PCA analysis for Python? If so please point me to an example. If not, how to combine…

python apache-spark apache-spark-mllib pca apache-spark-ml

asked Aug 02 '15 at 17:01

lapolonio

1,107
2
14
24

Prev 1 2 3

…

99 100 Next