0

Given I have the code that computes the following bi-plot

countries.coord <- predict(res.pca, newdata = test_data_PCA)    #Countries
p <- fviz_pca_ind(res.pca, repel = TRUE)
fviz_add(p, countries.coord, color ="blue")
cos2 <- function(countries.coord, d2){return(countries.coord^2/d2)}
ind.cos2 <- apply(countries.coord, 2, cos2, d2)
ind.cos2[, 1:3]

The problem with the massive outlier (USA) is seeming that it will distort visibility in my report. Is there any way how I can make it look more visible in a sense that will also depict all other countries?

enter image description here

Kash
  • 99
  • 4
  • From what i gathered, the code used to actually create the plot is missing, correct? Maybe you could just post the final dataset used in the plotting (by pasting the output from `dput(df)`) and the code for the plot – Ricardo Semião e Castro Oct 24 '22 at 20:02

1 Answers1

1

This isn't an actual answer as we don't have the data to run an example of your code, but below are the most common solutions for this kind of problem, and you decide which of those you prefer. Those that study data visualization can give better comentary on the pros and cons of each.

1. Use a discontinuous axis:

You remove the blank spaces by cutting your y (and x) axis, getting something like this:

enter image description here

Cons: its very arbitrary, and can be used for data manipulation
Pros: makes a very concise graph, without the change of scale

How to do it:

How can I make a discontinuous axis in R with ggplot2?

2. Apply a transformation to the axis:

You can modify your y (and x) axis to squish together the values, using, for example, a log transformation.

Cons: makes the interpretation a little harder, as the axis are not linear anymore.
Pros: it's a continuous transformation

How to do it:

p <- fviz_pca_ind(res.pca, repel = TRUE)
p <- fviz_add(p, countries.coord, color ="blue")
p + scale_y_log() + scale_x_log()

3. Create a diferent facet for the outliers:

You can create a secondary graph just for the outlier, getting something like this:

enter image description here

Cons: has the same problems of 1, and is less compact.
Pros: also doesn't affect the linearity, and is less arbitrary than 1.

How to do it:

https://www.j4s8.de/post/2018-01-15-broken-axis-with-ggplot2/