0

I am trying to reproduce a graphic similar to what is used on the California Dashboard for School Accountability in R Shiny. For sake of this example, consider the following data frame:

student <- c("1234", "4321", "5678", "8765")
schools <- c("ABC", "ABC", "XYZ", "XYZ")
DFS_20 <- c(-34.2, -1.5, 2.8, 8.9)
DFS_21 <- c(-13.5, 27.8, 5.4, 3.9)
school_data <- data.frame("student_id" = student, "school_id" = schools, "DFS_2020" = DFS_20, "DFS_2021" = DFS_21, "Delta_DFS" = DFS_21 - DFS_20)

I would like to somehow plot this data on a grid like this: 5x5 matrix

where you would have a data point in at [x, y] = [4, 1] (with lower left being (0,0) representing student 1234 because their DFS_2021 score is LOW (-13.5) but their year-over-year growth increased significantly (20.7); a point in [x, y] = [4, 3] for student 4321 because their DFS_2021 score is HIGH (27.8) and their year-over-year growth increased significantly (29.3) etc. I want a Bubble chart so that the point increases in size relative to the number of data points within each cell, but I have no idea where to start creating the canvas (with colors) to overlay the data points onto. I know I can translate their scores into lattice points to plot on a 5x5 grid, but to make the grid with the colors is beyond my skillset.

1 Answers1

1

In short you can recode the valus into factors and count each combination in your dataset. With this new table (containing current DFS level, DFS difference level and number of students in each category) you can easily create a point-plot.

To color code your points you may need an extra column in your table with the color. Therefore I created a meta color table (all DFS combinations and the associated color) and joined the tables.

Code

# load packages
library(tidyverse)

# create color table
df_col <- crossing(DFS_current_status = factor(c("very low", "low", "medium",
                                                 "high", "very high"),
                                               levels = c("very high", "high", 
                                                              "medium", "low", "very low")), 
                   DFS_diff = factor(c("declined significantly", "declined",
                                "maintained" ,"increased", 
                                "increased significantly"), 
                                levels = c("declined significantly",
                                           "declined", "maintained",
                                           "increased", "increased significantly"))) %>%
  add_column(color = c("green", "green", "blue", "blue","blue",
                       "green", "green", "green", "green", "blue",
                       "yellow", "yellow", "yellow", "green", "green", 
                       "orange", "orange", "orange", "yellow", "yellow", 
                       "red", "red", "red", "orange", "orange"))


# transform data
df <- school_data %>%  
  mutate(DFS_current_status = case_when(DFS_2021 >=  45 ~ "very high",
                                        between(DFS_2021, 10, 44.9) ~ "high",
                                        between(DFS_2021, -5, 9.9) ~ "medium",
                                        between(DFS_2021, -70, -5.1) ~ "low",
                                        DFS_2021 < -70 ~ "very low",
                                        TRUE ~ NA_character_),
         DFS_diff = case_when(Delta_DFS < -15 ~ "declined significantly",
                              between(Delta_DFS, -15, -3) ~ "declined",
                              between(Delta_DFS, -2.9, 2.9) ~ "maintained",
                              between(Delta_DFS, 3, 14.9) ~ "increased",
                              Delta_DFS >= 15 ~ "increased significantly",
                              TRUE ~ NA_character_)) %>%
  count(DFS_current_status, DFS_diff) %>%
  left_join(df_col) %>%
  mutate(DFS_current_status = factor(DFS_current_status,
                                     levels = rev(c("very high", "high", 
                                                "medium", "low", "very low"))),
         DFS_diff = factor(DFS_diff, 
                           levels = c("declined significantly",
                                      "declined", "maintained",
                                      "increased", "increased significantly")))


# create plot
p <- ggplot(df) +
  geom_point(aes(x = DFS_diff,
                 y = DFS_current_status, 
                 size = n,
                 color = color)) +
  scale_y_discrete(drop = F) +
  scale_x_discrete(drop = F, position = "top") +
  scale_color_identity()

# display plot in plotly
ggplotly(p) %>% 
  layout(xaxis = list(side ="top")) 

Plot enter image description here

Edit: Comment - color the grid instead of point

df_col %>% 
  ggplot() +
  geom_raster(aes(x = DFS_diff, 
                  y = rev(DFS_current_status),
                  fill= color)) +
  scale_fill_identity() +
  scale_x_discrete(position = "top") +
  geom_point(data = df, aes(x = DFS_diff,
                 y = DFS_current_status,
                 size = n))

enter image description here

tamtam
  • 3,541
  • 1
  • 7
  • 21
  • thank you for the work on this. What you did was awesome and I will take it into account as I build this. I have converted all of the points to a continuous scale so that they do not all end up on a lattice point. I am wondering if there is a way to color the grid you have above rather than denoting each point as a color? – Adam Caudell Mar 30 '21 at 19:09
  • @tamtam...you are awesome! This is perfect! THANK YOU! – Adam Caudell Mar 31 '21 at 18:06
  • UPDATE: Thank you @tamtam for your work. Instead of using a bubble chart, I converted the points to a continuous scale from [0, 5] in both x- and y- directions so that I can plot all points. However, I cannot adjust the color grid such that it overlaps with the major axis lines. For ex: currently a point a (2, 2) would be right in the middle of an orange box rather than on the boundary of an orange and yellow box. Is there a way to adjust the color grid to align at the origin and each box is 1 x 1? – Adam Caudell Apr 20 '21 at 02:03