Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Overview

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

Tag usage

Questions on tag pca should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software r for statistical computing and graphics, functions princomp and prcomp compute PCA.

2728 questions

votes

3 answers

How to use scikit-learn PCA for features reduction and know which features are discarded

I am trying to run a PCA on a matrix of dimensions m x n where m is the number of features and n the number of samples. Suppose I want to preserve the nf features with the maximum variance. With scikit-learn I am able to do it in this way: from…

python machine-learning scikit-learn pca feature-selection

asked Apr 25 '14 at 13:30

gc5

9,468
24
90
151

votes

2 answers

PCA on word2vec embeddings

I am trying to reproduce the results of this paper: https://arxiv.org/pdf/1607.06520.pdf Specifically this part: To identify the gender subspace, we took the ten gender pair difference vectors and computed its principal components (PCs). As Figure…

python scikit-learn nlp pca word2vec

asked Dec 29 '17 at 08:45

user2969402

1,221
3
16
26

votes

4 answers

Using Numpy (np.linalg.svd) for Singular Value Decomposition

Im reading Abdi & Williams (2010) "Principal Component Analysis", and I'm trying to redo the SVD to attain values for further PCA. The article states that following SVD: X = P D Q^t I load my data in a np.array X. X = np.array(data) P, D, Q =…

python numpy pca

asked Jul 23 '14 at 14:27

dms_quant

votes

3 answers

How to solve prcomp.default(): cannot rescale a constant/zero column to unit variance

I have a data set of 9 samples (rows) with 51608 variables (columns) and I keep getting the error whenever I try to scale it: This works fine pca = prcomp(pca_data) However, pca = prcomp(pca_data, scale = T) gives > Error in…

r matrix pca prcomp

asked Oct 29 '16 at 01:39

Brian Jackson

votes

5 answers

Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)

I saw this tutorial in R w/ autoplot. They plotted the loadings and loading labels: autoplot(prcomp(df), data = iris, colour = 'Species', loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size =…

python machine-learning scikit-learn pca dimensionality-reduction

asked Aug 29 '16 at 23:55

O.rka

29,847
68
194
309

votes

4 answers

Pyspark and PCA: How can I extract the eigenvectors of this PCA? How can I calculate how much variance they are explaining?

I am reducing the dimensionality of a Spark DataFrame with PCA model with pyspark (using the spark ml library) as follows: pca = PCA(k=3, inputCol="features", outputCol="pca_features") model = pca.fit(data) where data is a Spark DataFrame with one…

apache-spark apache-spark-sql pyspark pca apache-spark-ml

asked Oct 30 '15 at 04:19

nanounanue

7,942
7
41
73

votes

2 answers

Performing PCA on large sparse matrix by using sklearn

I am trying to apply PCA on huge sparse matrix, in the following link it says that randomizedPCA of sklearn can handle sparse matrix of scipy sparse format. Apply PCA on very large sparse matrix However, I always get error. Can someone point out…

python scikit-learn sparse-matrix pca svd

asked Nov 09 '15 at 06:46

khassan

votes

1 answer

Finding the dimension with highest variance using scikit-learn PCA

I need to use pca to identify the dimensions with the highest variance of a certain set of data. I'm using scikit-learn's pca to do it, but I can't identify from the output of the pca method what are the components of my data with the highest…

python scikit-learn pca variance

asked Mar 12 '13 at 18:17

Alberto A

1,160
4
17
35

votes

3 answers

Adding ellipses to a principal component analysis (PCA) plot

I am having trouble adding grouping variable ellipses on top of an individual site PCA factor plot which also includes PCA variable factor arrows. My code: prin_comp<-rda(data[,2:9], scale=TRUE) pca_scores<-scores(prin_comp) #sites=individual site…

r plot pca ggbiplot

asked Dec 18 '12 at 15:20

Lew

votes

5 answers

PCA first or normalization first?

When doing regression or classification, what is the correct (or better) way to preprocess the data? Normalize the data -> PCA -> training PCA -> normalize PCA output -> training Normalize the data -> PCA -> normalize PCA output -> training Which…

machine-learning normalization classification regression pca

asked Apr 12 '12 at 08:20

AlanS

votes

4 answers

Difference between PCA (Principal Component Analysis) and Feature Selection

What is the difference between Principal Component Analysis (PCA) and Feature Selection in Machine Learning? Is PCA a means of feature selection?

machine-learning pca feature-selection

asked Apr 27 '13 at 07:41

AbhinavChoudhury

1,167
1
18
38

votes

3 answers

How is the complexity of PCA O(min(p^3,n^3))?

I've been reading a paper on Sparse PCA, which is: http://stats.stanford.edu/~imj/WEBLIST/AsYetUnpub/sparse.pdf And it states that, if you have n data points, each represented with p features, then, the complexity of PCA is O(min(p^3,n^3)). Can…

matrix machine-learning time-complexity pca

asked Dec 10 '13 at 23:36

GrowinMan

4,891
12
41
58

votes

4 answers

In sklearn.decomposition.PCA, why are components_ negative?

I'm trying to follow along with Abdi & Williams - Principal Component Analysis (2010) and build principal components through SVD, using numpy.linalg.svd. When I display the components_ attribute from a fitted PCA with sklearn, they're of the exact…

python python-3.x numpy scikit-learn pca

asked Jun 26 '17 at 17:53

Brad Solomon

38,521
31
149
235

votes

2 answers

How to get "proportion of variance" vector from princomp in R

This should be very basic and I hope someone can help me. I ran a principal component analysis with the following call: pca <- princomp(....) summary(pca) Summary pca returns this description: PC1 PC2 PC3 Standard…

r pca variance princomp

asked Mar 14 '15 at 01:38

Neeraj Bhatnagar

votes

2 answers

Matlab - PCA analysis and reconstruction of multi dimensional data

I have a large dataset of multidimensional data(132 dimensions). I am a beginner at performing data mining and I want to apply Principal Components Analysis by using Matlab. However, I have seen that there are a lot of functions explained on the web…

matlab data-mining pca

asked Oct 02 '12 at 10:06

Simon

4,999
21
69
97

Prev 1

…

99 100 Next