0

I'm struggling to get 4 total geom_hlines on the following plot:

enter image description here

I want LDL cholesterol to have its own mean hline. Here's my code - any suggestions? I think it has to do with my errorbar but I can't figure out how to add LDL cholesterol in.

    GAchol <- ggplot(data = df, aes(x=Responder, y=Ncholest, color = "Cholesterol", na.rm = TRUE)) + 
      geom_jitter() +
      geom_jitter(data = df, aes(y=Nldl_cho, color = "LDL Cholesterol")) +
      geom_errorbar(
        data = df%>% group_by(Responder) %>% summarise(Ncholest = mean(Ncholest)),
        aes(x = Responder, ymin = Ncholest, ymax = Ncholest)
      ) + geom_hline(aes(yintercept = mean(Ncholest)), lty = 2) +
      geom_jitter(data = df, aes(y=Nldl_cho, color = "LDL Cholesterol")) +
      geom_hline(aes(yintercept = mean(Nldl_cho)), lty = 2) +
      theme_bw() +
      stat_summary(fun = mean,
        geom = "errorbar",
        aes(ymax = ..y.., ymin = ..y..),
        position = position_dodge(width = 0.8),
        width = 0.8
      )
Phil
  • 7,287
  • 3
  • 36
  • 66

1 Answers1

1

I don't quite follow your code so I might be way off here, but are you after something like this?

# Set up sample data
df <- data.frame(biomarker='Cholesterol', response='Responders', lipids=rnorm(n=100, mean=-0.3, sd=1)) %>%
  bind_rows(
    data.frame(biomarker='Cholesterol', response='Non-responders', lipids=rnorm(n=100, mean=0.5, sd=1)),
    data.frame(biomarker='LDL cholesterol', response='Responders', lipids=rnorm(n=100, mean=-0.4, sd=1)),
    data.frame(biomarker='LDL cholesterol', response='Non-responders', lipids=rnorm(n=100, mean=0.6, sd=1))
  ) %>%
  mutate(
    biomarker = as.factor(biomarker),
    response = as.factor(response)
  )

# Preview the first two rows for each of the four groups
df[c(1:2, 101:102, 201:202, 301:302), ]
          biomarker       response     lipids
1       Cholesterol     Responders -1.1312455
2       Cholesterol     Responders  0.5153858
101     Cholesterol Non-responders  1.4085121
102     Cholesterol Non-responders -0.3848261
201 LDL cholesterol     Responders -0.3880410
202 LDL cholesterol     Responders -0.8081946
301 LDL cholesterol Non-responders -0.3934018
302 LDL cholesterol Non-responders  0.4481896

# Simplified plotting code
ggplot(data=df, aes(x=response, y=lipids, col=biomarker)) +
  geom_jitter() +
  stat_summary(fun=mean, geom="crossbar", width=0.5, aes(color=biomarker)) +
  theme_bw()

Scatterplot

I used the stat_summary code from this answer.

Stewart Macdonald
  • 2,062
  • 24
  • 27
  • Thank you so much!!! that is exactly what I wanted you nailed it I appreciate it! wow. And yes my code was an accumulation of stuff I found online and I think a lot of it was redundant so thanks for dissecting that. – Molly Delzio Oct 09 '22 at 23:16
  • I do need more help, so I see my issue is with my data set based on your analysis - how can I create a dataframe similar to yours with my data set (df) with my variables below: cholesterol is df$Ncholest, LDL cholesterol is df$Nldl_chol, response status is df$Responder coded as 1 for non-responders and 2 for responders – Molly Delzio Oct 09 '22 at 23:36
  • Sounds like you have what's called a 'wide' dataset, and you want to transform it to a 'narrow' dataset. You can do that with the `pivot_longer` function in the `tidyr` package. If you still need help, feel free to create another question and post a sample of your dataframe and then let me know where the new question is. – Stewart Macdonald Oct 11 '22 at 04:03
  • Also, if the answer here has helped, mark the answer as accepted so that the question is closed. – Stewart Macdonald Oct 11 '22 at 04:05