0

I want to calculate the standard deviation in R. But the standard function "sd(x)" is not the function that I need. I'm looking for a function to calculate the sd(x, in dependency of another variable in my dataframe). So that I can add a new column with the sd by the dependency variable (image). Like this:

image   answer    sd
a       1         0,70
a       2         0,70
b       2         2,12
b       5         2,12
Tim Wendt
  • 11
  • 4

2 Answers2

1

Function ave is perfect for this.

dat <- read.table(text = "
image   answer    sd
a       1         0,70
a       2         0,70
b       2         2,12
b       5         2,12
", header = TRUE, dec = ",")

ave(dat$answer, dat$image, FUN = sd)
#[1] 0.7071068 0.7071068 2.1213203 2.1213203

EDIT.
Following the dialog with Henry in the comments, I have decided to edit the answer. Fortunately so, since that in the mean time I realized that the original dataset uses the comma as a decimal point.
So, first change, to include argument dec = "," in the read.table above.
Second change, to show a complete solution with column sd created by the ave instruction.

dat2 <- dat[-3]  # start with the OP's data without the 3rd column
dat2$sd <- ave(dat2$answer, dat2$image, FUN = sd)
dat2
#  image answer        sd
#1     a      1 0.7071068
#2     a      2 0.7071068
#3     b      2 2.1213203
#4     b      5 2.1213203
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • +1 though perhaps you could leave the `sd` column out of the input and then insert the calculations into the table with something like `dat$sd <- ave(dat$answer, dat$image, FUN = sd)` and then print the result – Henry Oct 10 '17 at 13:36
  • @Henry You're right, but I used the data example as the OP posted it. – Rui Barradas Oct 10 '17 at 15:01
  • I had thought that that was the desired output rather than the input. It doesn't matter either way – Henry Oct 10 '17 at 15:11
  • Thank's for your comments. Your solutions are perfect for me – Tim Wendt Oct 11 '17 at 07:13
0

What I understood is that you want the standard deviation of the answer for each image. You can group your df by image then use sd, which will calculate separetly for each group using dplyr.

df <- data.frame(image = c('a', 'a', 'b', 'b'),
                 answer = c(1, 2, 2, 5))

library(dplyr)    
df %>%
        group_by(image) %>%
        mutate(sd = sd(answer))
csgroen
  • 2,511
  • 11
  • 28