Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Overview

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software for statistical computing and graphics, functions princomp and prcomp compute PCA.

2728 questions
0
votes
1 answer

How to visualize a stepwise movement of PCA variables calculated repeatedly using different subsets of the same data to gain insights?

Imagine I have the dataset below for 34 subjects (I randomly sliced 100 observations from it because of line limitation here in the body). Each subject has multiple observations for different time points and regions. I would like to compare two…
doctorate
  • 1,381
  • 1
  • 19
  • 43
0
votes
2 answers

Adding lines to connect separate cluster in a chart

I saw this neat principal component analysis graph online, where they had lines connecting each cluster to a center point. I used an example data set to show that I have made it up to adding the ellipses, but after looking online, I think this PCA…
dkcongo
  • 227
  • 1
  • 9
0
votes
1 answer

How do I find the direction of greatest variance in a matrix?

I have rainfall data (corresponding to latitude, longitude, and amount of rainfall at that latitude and longitude.) I plotted this data using a 2D matrix, with matrix(i,j) corresponding to the i'th and j'th (sorted) latitudes and longitudes in my…
requiemman
  • 61
  • 6
0
votes
0 answers

Meaning Of High Variance With Few Components In PCA

I am new to PCA and currently applying it to a dataset where each sample consists of 500 measurements points. When applying PCA, the cumulative variance of the top 5 components is ~99%, which puzzles me. Could this dataset be described with 5…
Mark wijkhuizen
  • 373
  • 3
  • 10
0
votes
0 answers

Error for k-fold cross-validation and PCR in R with simulated data

For my thesis I am seeing whether 5-fold cross-validation can be used to find the optimal number of principal component in PCR for time series data. I am using a 3 factor model. However, when I try to run the PCR code I get an error as the data…
Mieska
  • 1
  • 1
0
votes
0 answers

Clustering on mixed data with related variables

I'm working with a mixed dataset (unique at the firm-year level) with related variables that look something like the following (but with many more variables of a similar nature), where: "sec" is the sector the firm belongs to and doesn't change…
0
votes
1 answer

What is the line in a 3D pca and its meaning?

Recently, I focused on 3D PCA. And I know how to produce 3D PCA plots through different packages in R, such as plotly, rgl and so on. But I have a small question from the picture below: I don't know how to add vertical lines in R just as the picture…
花落思量错
  • 352
  • 1
  • 11
0
votes
0 answers

Using functional Principal components to make predictions

I would like to use FPCA to reconstruct a partial curve. I have 20 temperature curves and each curve contains 365 days. I would like to do FPCA on 15 curves and extract functional PCA. The other 5 curves only have data up to 100 days. I would like…
bayoote
  • 1
  • 1
0
votes
1 answer

Getting pca.explained_variance_ratio_ for all components without doing PCA twice

I understand that explained_variance_ratio_ can be obtained easily using PCA but will be restricted to the contribution from the first n_components. I was wondering if explained_variance_ratio_ can be obtained for all components without doing PCA…
xinit
  • 147
  • 9
0
votes
1 answer

How to display Prince PCA Eigenvectors

I am looking for a way to display the eigenvectors on the prince library. Could you please tell me what is the command as I am not finding it in the documentation (eigenvalues_ for eigenvalues) :)
zazoupile
  • 142
  • 1
  • 15
0
votes
1 answer

What is the region produced by ggforce package (geom_mark_ellipse)

Here is my reproduceable data and code: dd<-structure(list(chr1_1005501 = c(0.597222222222222, 0.75, 0.775, 0.732456140350877, 0.860696517412935, 0.777777777777778, 0.654545454545455, …
花落思量错
  • 352
  • 1
  • 11
0
votes
0 answers

Pca of a vector

I have a time series data with 4 columns (time, radial, axial and temp) and 900 data points for each column. I have 4 num classes with 10 samples in each class. I want to convert each sample to a vector of dimension 1*2700, perform pca, then plot…
Zlatan
  • 35
  • 5
0
votes
0 answers

How to Adjust Weights Using PCA?

I have a data set composed of 5 indicators D1, D2, D3, D4 and D5, and their weighted sum DS, which I use to create the binary variable EP. library(tidyverse) weight <- rep(1/5,5) names(weight) <- c("D1","D2","D3","D4","D5") data <- data.frame(D1 =…
Saïd Maanan
  • 511
  • 4
  • 14
0
votes
1 answer

extract principal components from PCA in missMDA

I'm performing a multiple imputation PCA on a dataset that has several missing values in one variable, and I want to extract the first principal component to use in another model, but I can't figure out how to extract it from the results. #…
tnt
  • 1,149
  • 14
  • 24
0
votes
1 answer

Get values of red arrows created by biplot()

I wanted to determine the values of those red arrows created by the biplot() with the usage of prcomp() function. I would like to determine arrows length and position to compare them more accurately. Here is the code that I use: df <- read.csv(file…
Axton
  • 83
  • 5