1

I am running the following code to generate a graph in which each point is coloured differently for each company per financial year according to a specific sequence. In the background, there is also a boxplot for each financial year.

library(dplyr); library(forcats)
df1 %>% arrange(financial_year != "2018-19", -production) %>%
  mutate(company_code = fct_inorder(as.factor(company_code))) %>%
  arrange(company_code, financial_year) %>%
  ggplot(., mapping = aes(x=financial_year, y = production))+
       geom_point(aes(colour=company_code),position = position_jitter(height=0, width=0.2),
                  size = 1.1, alpha = 0.6) + geom_boxplot(aes(colour=financial_year))

This works as I envisioned except it disregards the first 3 functions (arrange, mutate, arrange) and does not colour company_code by the correct sequence. It should give an impression of gradient, but right now it is all random. Any ideas?

This works fine (sequence is correct) if I remove geom_boxplot(aes(colour=financial_year)).

example data below

company_code <- c(1,1,1,1,2,2,2,2,3,3,3,3)
financial_year <- c("2018-19","2019-20","2020-21","2018-21","2018-19","2019-20","2020-21","2018-21","2018-19","2019-20","2020-21","2018-21")
production <- c(2000,2500,3000,7500,1000,1500,1000,3500,5000,5500,4000,14500)

df <- data.frame(company_code,financial_year,production)
df$company_code <- as.factor(df$company_code)
df$financial_year <- as.factor(df$financial_year)

1 Answers1

0

You should swap the order of geom_point and geom_boxplot and add alpha to geom_boxplot like this:

company_code <- c(1,1,1,1,2,2,2,2,3,3,3,3)
financial_year <- c("2018-19","2019-20","2020-21","2018-21","2018-19","2019-20","2020-21","2018-21","2018-19","2019-20","2020-21","2018-21")
production <- c(2000,2500,3000,7500,1000,1500,1000,3500,5000,5500,4000,14500)

df <- data.frame(company_code,financial_year,production)
df$company_code <- as.factor(df$company_code)
df$financial_year <- as.factor(df$financial_year)

library(ggplot2); library(dplyr); library(forcats)
df %>% 
  arrange(financial_year != "2018-19", -production) %>%
  mutate(company_code = fct_inorder(as.factor(company_code))) %>%
  arrange(company_code, financial_year) %>%
  ggplot(., aes(x=financial_year, y = production)) +
  geom_boxplot(colour = "black", alpha = 0) +
  geom_point(aes(colour=company_code),position = position_jitter(height=0, width=0.2),
             size = 1.1, alpha = 0.6) 

Created on 2022-08-29 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Thanks @Quinten but I would like box plots to be empty (not coloured) in the background and points coloured for each company_code on top of the boxplot hence why I put it first in my code. In this example dataset, I only created 3 companies, but I have many more in my real dataset. Do you know how to achieve this? – Daniela Rodrigues Aug 29 '22 at 09:23
  • @DanielaRodrigues, I added some code, is this what you mean? – Quinten Aug 29 '22 at 09:30
  • Thanks @Quinten - I would like all boxplots to have a black line and be in the background and points coloured by company_code in the sequence specified in the first 3 lines of code in dplyr set. Do you know how to do this? It seems that what's only missing is the boxplot to have a black line. – Daniela Rodrigues Aug 29 '22 at 09:39
  • Oh my bad, if I do geom_boxplot(colour="black") than it works, do you want to update your code and I will mark your response as correct. Thanks @Quinten – Daniela Rodrigues Aug 29 '22 at 09:59
  • 1
    @DanielaRodrigues, Ah I see what you meant. Added some code. Glad it is solved! – Quinten Aug 29 '22 at 10:13