I have a question regarding the visualization of data using ggplot
in R
. Specifically, regarding the scaling of the y-axis in case of outliers.
Let's start with a sample dataset with observations from 31 IDs. 30 IDs are in an expected range and there is one outlier:
# Load libraries
library(tidyverse)
library(ggbeeswarm)
library(data.table)
# Set seed
set.seed(123)
# Create dataset
ID <- sprintf("ID-%s",seq(1:30))
baseline <- rnorm(30, mean = 50, sd = 3)
df <- data.frame(ID, baseline) %>%
mutate(`1` = baseline - rnorm(1, mean = 5, sd = 4),
`2` = `1` - rnorm(1, mean = 3, sd = 5),
`3` = `2` - rnorm(1, mean = 1, sd = 3))
# Add outlier
df <- as.data.frame(rbindlist(list(df, list("ID-31", 0.01, 0.02, 0.03 ,1))))
df <- df %>%
pivot_longer(-ID) %>%
rename(time = name) %>%
mutate(time = as.factor(time))
#Plot
ggplot(data = df, aes(x=time, y = value)) +
geom_quasirandom() +
theme_classic() +
scale_x_discrete(limits = c("baseline", "1", "2", "3") ) +
labs(x = "Time", y = "Value")
Expected output
Since the variation in the upper part of the graph is not well visible, I would like to scale the x-axis in a way that shows all values but focusses on a certain part of the plot (in this case values between 20 and 50).
Question
Is it possible to scale the x-axis in such a way?
Additional info
I am specifically not looking for a data transformation solution. Furthermore, I am aware of the scale_y_continuous
function in ggplot
and it limits
argument, but this omits a part of the data.