0

For last week's TidyTuesday challenge, I want to make a stream graph that will depict the top 5 boardgames categories' mean average in the years between 1990 and 2022. To this end, I did some data wrangling as the following

ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/ratings.csv')
details <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/details.csv')

board_games <-
  ratings %>%
  left_join(details, by = "id")

board_games$boardgamecategory <- substring(board_games$boardgamecategory,3,nchar(board_games$boardgamecategory)-2)
board_games$boardgamecategory <- str_replace_all(board_games$boardgamecategory, c("'" = ""))
splitted_data <-separate(board_games, col = boardgamecategory, 
                          into = c("categories1","categories2","categories3",
                                   "categories4","categories5","categories6",
                                   "categories7","categories8","categories9",
                                   "categories10","categories11","categories12",
                                   "categories13","categories14"), sep=",") 

top_categories <- splitted_data %>%  
  pivot_longer(cols = categories1:categories14, names_to = "topcategories", values_to = "categoriestype", values_drop_na = TRUE) %>%
  select(-c(topcategories)) %>%
  group_by(categoriestype) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

top_categories_data <- splitted_data %>%
  pivot_longer(cols = categories1:categories14, names_to = "topcategories", values_to = "categoriestype", values_drop_na = TRUE) %>%
  select(-c(topcategories)) %>%
  filter(categoriestype %in% c("Card Game", " Wargame", " Fantasy", " Party Game", "Abstract Strategy")) %>%
  select(categoriestype, average, yearpublished) %>%
  group_by(yearpublished, categoriestype) %>%
  mutate(mean_average = mean(average)) %>%
  select(-c(average)) %>%
  distinct(categoriestype, .keep_all = TRUE) %>%
  as.data.frame() %>%
  filter(yearpublished > 1989) %>%
  arrange(desc(yearpublished), categoriestype)

top_categories_data$categoriestype <- trimws(top_categories_data$categoriestype)
top_categories_data$mean_average <- round(top_categories_data$mean_average, 2)

As a result of my data cleaning, as shown above, my data's final shape is as such

categoriestype yearpublished mean_average
1             Fantasy          2022         7.81
2          Party Game          2022         7.86
3             Wargame          2022         8.27
4   Abstract Strategy          2022         8.12
5           Card Game          2022         7.81
6             Fantasy          2021         7.66
7          Party Game          2021         7.03
8             Wargame          2021         8.13
9   Abstract Strategy          2021         7.00
10          Card Game          2021         7.27

Now, when I try to plot a stream graph with the following code

pp <- streamgraph(top_categories_data, key="categoriestype", value="mean_average", date="yearpublished", 
                  height="300px", width="1000px")

the plot is somehow ridiculous as the following

enter image description here

I could not understand where the problem is or why the graph is plotted as in this shape. Therefore, if you can help me, I appreciate this.

mzkrc
  • 219
  • 2
  • 7
  • What are you expecting to get? A stream graph is a kind of stacked area graph and that's what you get. And I'm afraid a stacked chart is not the best way to visualize a time series of mean ratings for different categories. A stream graph is IMO more suited to display e.g. the time series of counts by category. – stefan Jan 31 '22 at 11:48
  • Considering that x axis will be the years and y axis will be the mean average of categories, I expect to get something like in this website https://www.r-graph-gallery.com/154-basic-interactive-streamgraph-2.html. One of the problems in my graph for instance is that y axis is extended to 30, while the mean averages are in the range of 5-9 mostly. I hope I made myself clear. – mzkrc Jan 31 '22 at 12:10
  • Btw, I also tried something to count newly published board games on these 5 categories, and as you suggested, the graph is much more similar to the one I sent as an example. But still, I do not understand why the same cannot be done with the means. At the end, why do not means depict on the Y axis? – mzkrc Jan 31 '22 at 12:21

0 Answers0