4

I want to plot a graph with the median of each row (not column!)(except values from the first column) with the standard deviation as errorbar. The result should look similar to that:

enter image description here

I have a dataframe like this:

myTable <- "
        1     -50     -52
        2     -44     -51
        3     -48     -50
        4     -50     -49
        5     -44     -49
        6     -48     -49
        7     -48     -49
        8     -44     -48
        9     -49     -48
       10     -48     -45
       11     -60     -48
       10     -50     -48
       11     -80     -47"
df <- read.table(text=myTable, header = TRUE)
df <- c("ID","Value1","Value2");

My data is stored in a .csv file, which I load with the following line:

df <- read.csv(file="~/path/to/myFile.csv", header=FALSE, sep=",")
schande
  • 576
  • 12
  • 27
  • 1
    You have two columns and you want to take the pairwise median? And then calculate the standard deviation for pairs of numbers? That seems kind of odd from a statistical perspective to me. Does your real data have more columns? – MrFlick May 23 '18 at 18:25
  • Yes, my real data has 20 columns with values – schande May 23 '18 at 18:39
  • `df <- c("ID","Value1","Value2")` should be `names(df) <- c("ID","Value1","Value2")` – eipi10 May 23 '18 at 18:55
  • What does this change? – schande May 23 '18 at 18:55
  • Type `df` in the console after running `df <- read.table(text=myTable, header = TRUE)` and after running `df <- c("ID","Value1","Value2")`. – eipi10 May 23 '18 at 18:57

1 Answers1

4

The code below creates a helper function to provide the median and sd values for plotting. We also transform the data to "long" format before plotting.

library(tidyverse)
theme_set(theme_bw())

df <- read.table(text=myTable, header = TRUE)
names(df) <- c("ID","Value1","Value2")

median_sd = function(x, n=1) {
  data_frame(y = median(x),
             sd = sd(x),
             ymin = y - n*sd,
             ymax = y + n*sd)
}

ggplot(df %>% gather(key, value, -ID), aes(ID, value)) +
  stat_summary(fun.data=median_sd, geom="errorbar", width=0.1) +
  stat_summary(fun.y=median, geom="line") +
  stat_summary(fun.y=median, geom="point") +
  scale_x_continuous(breaks=unique(df$ID))

enter image description here

You can avoid the helper function with the following code, but the function is handy to have around if you're going to do this a lot.

ggplot(df %>% gather(key, value, -ID), aes(ID, value)) +
  stat_summary(fun.y=median, fun.ymin=function(x) median(x) - sd(x), 
               fun.ymax=function(x) median(x) + sd(x), geom="errorbar", width=0.1) +
  stat_summary(fun.y=median, geom="line") +
  stat_summary(fun.y=median, geom="point") +
  scale_x_continuous(breaks=unique(df$ID))
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • It plots the graph with the median but without errorbar and I get the following error: `Error in UseMethod("gather_") : no applicable method for 'gather_' applied to an object of class "function"` – schande May 23 '18 at 18:50
  • 1
    Did you install and load the `tidyverse` package? – eipi10 May 23 '18 at 18:53
  • Yes, I even put `library(tidyverse)` above the code – schande May 23 '18 at 18:55
  • 1
    Are you running this on the example data in your question or your actual data? If the latter, it might be difficult to tell what's going wrong without seeing a sample of your actual data and the actual code you're running. – eipi10 May 23 '18 at 18:56
  • Had a misplaced cell in my data. Your solution works like a charm now! Thank you. – schande May 23 '18 at 19:06