Exploratory Data Analysis on Datasets with too much variables

Question

My question is a little bit theoretical.

I have a dataset with 100+ columns, Every EDA method that I use results in a messed-up plot, How can I get more interpretable plots and tables with such data?

score 1 · Accepted Answer · answered Sep 18 '22 at 09:29

@Zine

Try using only the variables you need in the visualizations.

You can use Principal Components Analysis (PCA) to reduce the variables. It is an effective way of reducing the variables but contains the same quality data. For your reference, links to learn PCA: -

https://www.sartorius.com/en/knowledge/science-snippets/what-is-principal-component-analysis-pca-and-how-it-is-used-507186

2)https://www.geeksforgeeks.org/ml-principal-component-analysispca/

3)https://www.machinelearningplus.com/machine-learning/principal-components-analysis-pca-better-explained/

score 0 · Answer 2 · answered Sep 19 '22 at 15:22

Try dimensionality reduction methods like PCA and T-SNE.If you want to visualize data go with T-SNE .However PCA can give you some information about how much data (or variance of data) you are preserving while reducing dimensions.

Here's a link which can explain the different Dimensionality reduction techniques and their results.

Exploratory Data Analysis on Datasets with too much variables

2 Answers2