-1

I'm trying to replicate the image below in R (Original post). I have seen similar posts (Post 1 and Post 2) but none similar to this plot. I'm just wondering if anyone knows how to do something similar in R. There's a couple of observations:

  1. Bubbles do not overlap
  2. Smaller bubbles tend to be closer to the axis (but not always!)
  3. Bubbles are in two categories

I'm sure that data from Post 1 would be helpful!

https://docs.google.com/spreadsheets/d/11nq5AK3Y1uXQJ8wTm14w9MkZOwAxHHTyPyEvUxCeGVc/edit?usp=sharing

Thank you so much,

NYT plot

CRP
  • 405
  • 2
  • 11

1 Answers1

2

Ok so this is just a starting point that people could use to formulate a better answer to the question. It uses the packcircles package to (surprisingly) pack circles. It doesn't qualify all of your criteria, but can serve as a useful starting point. We're just going to pretend that the eruptions column from the faithful dataset is your time variable.

library(packcircles)
#> Warning: package 'packcircles' was built under R version 4.0.2
library(ggplot2)
library(scales)
library(ggrepel)

# Setup some data, suppose we'd like to label 5 samples
set.seed(0)
faith2 <- faithful
faith2$label <- ""
faith2$label[sample(nrow(faith2), 5)] <- LETTERS[1:5]

# Initialise circle pack data
init <- data.frame(
  x = faith2$eruptions,
  y = runif(nrow(faith2)),
  areas = rescale(faith2$waiting, to = c(0.01, 0.1))
)

# Use the repelling layout
res <- circleRepelLayout(
  init,
  xlim = range(init$x) + c(-1, 1),
  ylim = c(0, Inf),
  xysizecols = c(1, NA, 3),
  sizetype = "radius",
  weights = 0.1
)

# Prepare for ggplot2
df <- circleLayoutVertices(res$layout)
df <- cbind(df, faith2[df$id,])

This is showing that the circles are reasonably placed with respect to our fake time variable.

# Plot
ggplot(df, aes(x, y, group = id)) +
  geom_polygon(aes(fill = eruptions,
                   colour = I(ifelse(nzchar(label), "black", NA)))) +
  scale_fill_viridis_c() +
  coord_equal()

And this is showing that the circle size is reasonably corresponding to a different variable.

ggplot(df, aes(x, y, group = id)) +
  geom_polygon(aes(fill = waiting,
                   colour = I(ifelse(nzchar(label), "black", NA)))) +
  scale_fill_viridis_c() +
  coord_equal()

Created on 2020-07-11 by the reprex package (v0.3.0)

There are few flaws in this, notably it doesn't satisfy the 2nd criterion (circles aren't hugging the axis). Also, for reasons beyond my understanding, the packcircles layout couldn't place about 12% of datapoints, which are assigned NaN in df. Anyway, hopefully somebody smarter than me will do a better job at this.

teunbrand
  • 33,645
  • 4
  • 37
  • 63
  • thanks for this reproducible and working example. however, the op asked for a timeline. this is not possible using the example provided here because if I understand correctly the x and y coordinates are used to plot the size of the bubbles? – eylemyap Dec 20 '20 at 17:04
  • 1
    Well yes, the x and y are vertices of circle-like polygons. However, I don't see how this won't apply to timelines, as the datetime class is also a numeric under the hood and circles should be roughly centered at the x-coordinate coinciding with the original data. – teunbrand Dec 20 '20 at 17:43
  • So I tried two versions. I replaced the eruptions variable with a random date, however I get the error message "Error in as.Date.numeric(value) : 'origin' must be supplied". This seems to be an age old problem in R dealing with dates as numeric variables. Nevertheless, I also tried out with random numeric year values and the resulting plot was also very odd. here is the code: https://drive.google.com/file/d/1Hm6bEvHHuv99qoFsgHKO59UtLlxfmQhl/view?usp=sharing – eylemyap Dec 22 '20 at 11:17