Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Overview

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

Tag usage

Questions on tag pca should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software r for statistical computing and graphics, functions princomp and prcomp compute PCA.

2728 questions

votes

5 answers

Apply PCA on very large sparse matrix

I am doing a text classification task with R, and I obtain a document-term matrix with size 22490 by 120,000 (only 4 million non-zero entries, less than 1% entries). Now I want to reduce the dimensionality by utilizing PCA (Principal Component…

language-agnostic machine-learning sparse-matrix pca

asked May 23 '12 at 10:51

Ensom Hodder

1,522
5
18
35

votes

8 answers

What is the fastest way to calculate first two principal components in R?

I am using princomp in R to perform PCA. My data matrix is huge (10K x 10K with each value up to 4 decimal points). It takes ~3.5 hours and ~6.5 GB of Physical memory on a Xeon 2.27 GHz processor. Since I only want the first two components, is…

r pca eigenvector eigenvalue

asked Nov 28 '11 at 17:03

384X21

6,553
3
17
17

votes

2 answers

Using mca package in Python

I am trying to use the mca package to do multiple correspondence analysis in Python. I am a bit confused as to how to use it. With PCA I would expect to fit some data (i.e. find principal components for those data) and then later I would be able to…

python-3.x pandas scikit-learn pca

asked Jan 30 '18 at 12:40

Dan

45,079
17
88
157

votes

5 answers

How to find the closest 2 points in a 100 dimensional space with 500,000 points?

I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it? Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework.

algorithm performance nearest-neighbor pca approximate-nn-searching

asked Oct 10 '10 at 05:06

Edwin Jose Palathinkal

votes

3 answers

is it possible Apply PCA on any Text Classification?

I'm trying a classification with python. I'm using Naive Bayes MultinomialNB classifier for the web pages (Retrieving data form web to text , later I classify this text: web classification). Now, I'm trying to apply PCA on this data, but python is…

python scikit-learn pca naivebayes

asked Jan 11 '16 at 15:52

zer03

votes

1 answer

Pass PCA preprocessing arguments to train()

I'm trying to build a predictive model in caret using PCA as pre-processing. The pre-processing would be as follows: preProc <- preProcess(IL_train[,-1], method="pca", thresh = 0.8) Is it possible to pass the thresh argument directly to caret's…

r machine-learning pca r-caret

asked Apr 14 '15 at 08:11

Timm S.

5,135
6
24
38

votes

1 answer

How to compare predictive power of PCA and NMF

I would like to compare the output of an algorithm with different preprocessed data: NMF and PCA. In order to get somehow a comparable result, instead of choosing just the same number of components for each PCA and NMF, I would like to pick the…

scikit-learn pca dimensionality-reduction matrix-factorization nmf

asked Jan 08 '18 at 10:46

Phil D

votes

2 answers

Basic example for PCA with matplotlib

I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example: Get some dummy data in 2D and start PCA: from matplotlib.mlab import…

python matplotlib pca

asked Aug 18 '13 at 13:19

Tyrax

votes

1 answer

R - how to make PCA biplot more readable

I have a set of observations with 23 variables. When I use prcomp and biplot to plot the results I run into several problems: the actual plot only occupies half of the frame (x < 0), but the plot is centered on 0, so half of space is wasted two…

r plot pca

asked Jun 11 '13 at 23:07

Jakub Bochenski

3,113
4
33
61

votes

1 answer

PCA inverse transform manually

I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives. The transform is simple, it is just data *…

python numpy scikit-learn pca

asked Sep 23 '15 at 23:09

Baron Yugovich

3,843
12
48
76

votes

2 answers

PCA with missing values in Python

I'm trying to do a PCA analysis on a masked array. From what I can tell, matplotlib.mlab.PCA doesn't work if the original 2D matrix has missing values. Does anyone have recommendations for doing a PCA with missing values in Python? Thanks.

python numpy pca

asked Apr 02 '15 at 19:15

Emily

votes

4 answers

Test significance of clusters on a PCA plot

Is it possible to test the significance of clustering between 2 known groups on a PCA plot? To test how close they are or the amount of spread (variance) and the amount of overlap between clusters etc.

r statistics pca

asked Nov 28 '13 at 07:46

mindlessgreen

11,059
16
68
113

votes

2 answers

PCA and KNN algorithm

I am using KNN to classify handwritten digits. I also now have implemented PCA to reduce the dimensionality. From 256 I went to 200. But I only notice like, ~0.10% loss of information. I deleted 56 dimension. Shouldn't the loss be bigger? Only when…

algorithm pca knn

asked Apr 16 '12 at 23:20

Test Test

2,831
8
44
64

votes

4 answers

PCA with several time series as features of one instance with sklearn

I want to apply PCA on a data set where I have 20 time series as features for one instance. I have some 1000 instances of this kind and I am looking for a way to reduce dimensionality. For every instance I have a pandas Data Frame, like: import…

python scikit-learn time-series pca

asked Sep 21 '18 at 18:29

Mina L.

votes

3 answers

PCA memory error in Sklearn: Alternative Dim Reduction?

I am trying to reduce the dimensionality of a very large matrix using PCA in Sklearn, but it produces a memory error (RAM required exceeds 128GB). I have already set copy=False and I'm using the less computationally expensive randomised PCA. Is…

python multidimensional-array scikit-learn pca

asked Apr 11 '17 at 22:53

Chris Parry

2,937
7
30
71

Prev 1 2

…

99 100 Next