1

I am new to R and I am practicing trying to create some basic graphs. In the below code I am getting an average line out of 2 variables. How can I add ±Standard Deviation to the average line? Any help would be much appreciated.

dat <- structure(list(distance = c(0, 0.2174741, 0.4349482, 0.6520882, 0.8695623, 1.0870364, 1.3045105, 1.5216505, 1.7391246, 1.9565987, 2.1740728, 2.3912128), grey1 = c(-300.364, -296.963, -292.887, -290.31, -285.777, -279.921, -274.418, -272.005, -273.666, -270.381, -270.273, -270.705), distance1 = c(0, 0.2114969, 0.4229937, 0.6341657, 0.8456625, 1.0571594, 1.2686562, 1.4798282, 1.6913251, 1.9028219, 2.1143188, 2.3254907), grey2 = c(-135.219, -132.601, -131.959, -133.514, -127.111, -116.404, -116.85,  -115.464, -102.823, -101.497, -98.245, -98.474)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))

library(data.table)
newdat <- data.table::melt(as.data.table(dat),
                           measure = patterns("^distance","^grey"),
                           value.name = c("distance", "grey"))
newdat

ggplot(newdat, aes(distance, grey)) +
  geom_line(aes(color = variable, group = variable))

ggplot(newdat, aes(distance, grey, color = variable)) +
  geom_line() +
  geom_point()

newdist <- seq(0, min(max(dat$distance), max(dat$distance1)), by = 0.1)
newdat2 <- newdat[, setNames(approx(distance, grey, xout = newdist), c("distance", "grey")), by = variable
                  ][, .(variable = "Avg", grey = mean(grey)), by = distance]
newdat2 <- rbindlist(list(newdat, newdat2), use.names = TRUE)

ggplot(newdat2, aes(distance, grey, color = variable)) +
  geom_line() +
  geom_point() 
  • You could try `geom_smooth` which adds a 95% confidence line by default. It would be easier to give you specific advice if you shared your data. `dput(dat)` would give an output that you could copy snd paste into your question to make it reproducible. – Allan Cameron Jun 04 '22 at 18:05
  • Does this answer your question? [How to plot line with standard deviation of each row with ggplot](https://stackoverflow.com/questions/50495020/how-to-plot-line-with-standard-deviation-of-each-row-with-ggplot) – Jan Jun 04 '22 at 18:21
  • I would like for the Standard deviation to be shown not per point on the line, but similarly to geom_smooth. – Ruthilia Vera Jun 04 '22 at 18:27

1 Answers1

2

Your x values are slightly different for the two different groups, so the standard deviation isn't really defined at each unique value of distance. One compromise is to use a loess regression, which will include a confidence interval around the conditional mean of the two groups. This is straightforward using geom_smooth:

ggplot(newdat, aes(distance, grey)) +
  geom_line(aes(color = variable)) +
  geom_smooth(aes(color = "Avg"), method = loess, size = 0.7, alpha = 0.15) +
  geom_point() +
  theme_light(base_size = 16)

enter image description here

If you are satisfied to use the row-wise values for means and sds, you could do:

library(data.table)
library(ggplot2)

newdat <- data.table::melt(as.data.table(dat),
                           measure = patterns("^distance","^grey"),
                           value.name = c("distance", "grey"))

dat$mean <- (dat$grey1 + dat$grey2) / 2
dat$d <- (dat$distance1 + dat$distance) / 2
dat$sd <- unlist(Map(function(a, b) sd(c(a, b)), dat$grey1, dat$grey2))

ggplot(newdat) +
  geom_line(aes(distance, grey, color = variable)) +
  geom_ribbon(data = dat, aes(d, ymin = mean - sd, ymax = mean + sd),
              alpha = 0.15) +
  geom_line(data = dat, aes(d, mean, color = "Avg")) +
  geom_point(data = dat, aes(d, mean)) +
  geom_point(aes(distance, grey)) +
  theme_light(base_size = 16)

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • My x values are different yes for the 2 variables, but the average is calculated from the same x values for the 2 variables. As such, the standard deviation of the average line could be shown? – Ruthilia Vera Jun 04 '22 at 18:30
  • @RuthiliaVera see my update. The sd is very wide because you are calculating it for only two points at a time, but if this is what you want, that is how it can be achieved – Allan Cameron Jun 04 '22 at 18:39
  • Thank you very much for your answer! This has worked well. I tried applying your code to more values, for which the row length of the x values is not the same for 2 variables and it shows me: ''' Error in `$<-.data.frame`(`*tmp*`, d, value = numeric(0)) : replacement has 0 rows, data has 12 ''' Do you know how this problem could be solved? Thank you again. – Ruthilia Vera Jun 04 '22 at 23:05