How to handle massive outliers in PCA Bi-Plots?

Question

Given I have the code that computes the following bi-plot

countries.coord <- predict(res.pca, newdata = test_data_PCA)    #Countries
p <- fviz_pca_ind(res.pca, repel = TRUE)
fviz_add(p, countries.coord, color ="blue")
cos2 <- function(countries.coord, d2){return(countries.coord^2/d2)}
ind.cos2 <- apply(countries.coord, 2, cos2, d2)
ind.cos2[, 1:3]

The problem with the massive outlier (USA) is seeming that it will distort visibility in my report. Is there any way how I can make it look more visible in a sense that will also depict all other countries?

From what i gathered, the code used to actually create the plot is missing, correct? Maybe you could just post the final dataset used in the plotting (by pasting the output from `dput(df)`) and the code for the plot — Ricardo Semião e Castro, Oct 24 '22 at 20:02

Ricardo Semião e Castro · Accepted Answer · 2022-10-24T21:45:54.823

This isn't an actual answer as we don't have the data to run an example of your code, but below are the most common solutions for this kind of problem, and you decide which of those you prefer. Those that study data visualization can give better comentary on the pros and cons of each.

1. Use a discontinuous axis:

You remove the blank spaces by cutting your y (and x) axis, getting something like this:

Cons: its very arbitrary, and can be used for data manipulation
Pros: makes a very concise graph, without the change of scale

How to do it:

How can I make a discontinuous axis in R with ggplot2?

2. Apply a transformation to the axis:

You can modify your y (and x) axis to squish together the values, using, for example, a log transformation.

Cons: makes the interpretation a little harder, as the axis are not linear anymore.
Pros: it's a continuous transformation

How to do it:

p <- fviz_pca_ind(res.pca, repel = TRUE)
p <- fviz_add(p, countries.coord, color ="blue")
p + scale_y_log() + scale_x_log()

3. Create a diferent facet for the outliers:

You can create a secondary graph just for the outlier, getting something like this:

Cons: has the same problems of 1, and is less compact.
Pros: also doesn't affect the linearity, and is less arbitrary than 1.

How to do it:

https://www.j4s8.de/post/2018-01-15-broken-axis-with-ggplot2/

How to handle massive outliers in PCA Bi-Plots?

1 Answers1

1. Use a discontinuous axis:

2. Apply a transformation to the axis:

3. Create a diferent facet for the outliers: