Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Overview

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software for statistical computing and graphics, functions princomp and prcomp compute PCA.

2728 questions
0
votes
0 answers

Handling survey type data for clustering analysis

I have at my disposal categorical answers retrieved for a job application. I transformed those into dummy variables. I have in total around 60 columns and 2000 rows. I have a target column which describes the status (hired, rejected, hired by other…
0
votes
0 answers

How do I use PCA for prediction?

I'm trying to predict GDP per capita from about 50 variables, and the PCA shows that the two principal components explain about 40% of the variance. But I want it to show how much of the variance in GDP per capita specifically, it explains. Because…
0
votes
1 answer

Why does eig give an inverted PC1 value compared to eigh, but PC2 is the same?

I am comparing eig and eigh and I would like clarification of why the results are partly inverted! f1 and f2 are my 2 features. pc1 and pc2 are the principal components using eig. pc1h and pc2h are using eigh. As can be seen, pc2 and pc2h appear…
ManInMoon
  • 6,795
  • 15
  • 70
  • 133
0
votes
1 answer

PCA "could not convert string to float"

import pandas as pd from sklearn.decomposition import PCA from sklearn.cluster import KMeans data = pd.read_csv("tfidf_smogon.csv") data.drop(['Categoría'], axis=1, inplace=True) data.drop(data.columns[0], axis=1, inplace=True) print(data) pca =…
yoryi
  • 13
  • 4
0
votes
0 answers

Applying eigenvectors to another dataframe

I want to apply the eigenvectors which I have acquired from one data to another, my question is, before projecting the eigenvectors to get the principal components of second data frame, should I standardize the second data frame first? I have used…
0
votes
0 answers

How to specify overall plot size for fviz_pca_biplot of factoextra

I have created a PCA biplot using the code below. I need to specify the plot as a certain size (9 cm x 9cm), but have not figured out how to do so. I tried defining variables for the width and height and adding them using theme(plot.width..., but no…
0
votes
0 answers

Using a for loop to reconstruct exchange rate returns using a factor model formula

I have a dataset (nrow = 10,000 ncol = 29) called random_draws for a factor model. the first 27 columns are exchange rate returns, 28th and 29th columns are the factors observations. I want to calculate exchange rate returns with the following…
0
votes
0 answers

How to perform a Principal Component Analysis (PCA) inverse transform using ML.NET

I'm trying to convert the Python code at PCA-Based Fraud Detection into C# using ML.NET as an example of how to perform untrained anomaly detection using PCA. I can perform the transform and get my new principal components, but I am struggling to…
Chris99
  • 1
  • 2
0
votes
0 answers

Is it the best choice to use PCA when some values do not change?

I need to do a PCA on a big dataset with multiple variables. But for some of this variables, I have only one constant value fo each locations. My question is : Will this repetitive value change and influence more my PCA ? Is there a better way than…
Mimosa
  • 47
  • 4
0
votes
0 answers

N component problem of PCA with low and high cumulative variance

I am working on a data whose shape is 2020, 1000 and I am trying to apply PCA to this dataset. When I look at the cumulative variance and n component number plotting, I observe a break (elbow) point at 0.2 cumulative variance level and it is…
ned
  • 61
  • 1
  • 5
0
votes
0 answers

Is it possible/meaningful to split PCA results into facets by stimulus (in R)?

I am trying to quantify the reaction of animals to three different stimuli. For this I have noted the number of occurrence of different behaviors per 10 second bins for three minutes before and three minutes after stimulus onset. therefore I have…
lframond
  • 101
  • 1
0
votes
0 answers

How can I remove arrow labels on a PCA biplot in Python and place them outside the figure?

I am plotting pcaplots and biplots and I would like to remove the labels on the arrow and place them outside the figure. I would appreciate any help with this. This is what I have tried so far: cluster.biplot(cscore=pca_scores, loadings=loadings,…
maks
  • 11
  • 4
0
votes
0 answers

Linear discriminant analysis based on principal components in R

I am trying to perform an LDA on the 2 first PCs of the PCA analysis. I want to sample the data to 80% ref and 20% test. I am wondering how I can mention the grouping (by Species). This is my data: eq_ph1 Species Sex Name X1 X2 X3 X4 …
Azy
  • 65
  • 8
0
votes
0 answers

How to draw time trajectory arrows in a RDA plot in R?

The dataset mite, was sampled across five sites for 12 months. I have a PCA plot as follows. I want to draw a "time trajectory" such that for every site, there are 11 points that are connected starting from 1 and ending at month 12. I know I can do…
Share
  • 395
  • 7
  • 19
0
votes
0 answers

Is it possible to alter the font on a PCA plot created by plot.PCA() in R ggplot2?

How could I change the font on a plot.PCA() function of R? I am doing the plot of the PCA of a given dataset, and I am trying to change the font of the plot, but it seems to be impossible. plot.PCA(pca1, col.quali = 8, habillage = 8, label = "none",…