For last week's TidyTuesday
challenge, I want to make a stream graph that will depict the top 5 boardgames categories' mean average in the years between 1990 and 2022. To this end, I did some data wrangling as the following
ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/ratings.csv')
details <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/details.csv')
board_games <-
ratings %>%
left_join(details, by = "id")
board_games$boardgamecategory <- substring(board_games$boardgamecategory,3,nchar(board_games$boardgamecategory)-2)
board_games$boardgamecategory <- str_replace_all(board_games$boardgamecategory, c("'" = ""))
splitted_data <-separate(board_games, col = boardgamecategory,
into = c("categories1","categories2","categories3",
"categories4","categories5","categories6",
"categories7","categories8","categories9",
"categories10","categories11","categories12",
"categories13","categories14"), sep=",")
top_categories <- splitted_data %>%
pivot_longer(cols = categories1:categories14, names_to = "topcategories", values_to = "categoriestype", values_drop_na = TRUE) %>%
select(-c(topcategories)) %>%
group_by(categoriestype) %>%
summarise(count = n()) %>%
arrange(desc(count))
top_categories_data <- splitted_data %>%
pivot_longer(cols = categories1:categories14, names_to = "topcategories", values_to = "categoriestype", values_drop_na = TRUE) %>%
select(-c(topcategories)) %>%
filter(categoriestype %in% c("Card Game", " Wargame", " Fantasy", " Party Game", "Abstract Strategy")) %>%
select(categoriestype, average, yearpublished) %>%
group_by(yearpublished, categoriestype) %>%
mutate(mean_average = mean(average)) %>%
select(-c(average)) %>%
distinct(categoriestype, .keep_all = TRUE) %>%
as.data.frame() %>%
filter(yearpublished > 1989) %>%
arrange(desc(yearpublished), categoriestype)
top_categories_data$categoriestype <- trimws(top_categories_data$categoriestype)
top_categories_data$mean_average <- round(top_categories_data$mean_average, 2)
As a result of my data cleaning, as shown above, my data's final shape is as such
categoriestype yearpublished mean_average
1 Fantasy 2022 7.81
2 Party Game 2022 7.86
3 Wargame 2022 8.27
4 Abstract Strategy 2022 8.12
5 Card Game 2022 7.81
6 Fantasy 2021 7.66
7 Party Game 2021 7.03
8 Wargame 2021 8.13
9 Abstract Strategy 2021 7.00
10 Card Game 2021 7.27
Now, when I try to plot a stream graph with the following code
pp <- streamgraph(top_categories_data, key="categoriestype", value="mean_average", date="yearpublished",
height="300px", width="1000px")
the plot is somehow ridiculous as the following
I could not understand where the problem is or why the graph is plotted as in this shape. Therefore, if you can help me, I appreciate this.