1

I have several data frames that look similar to the following data frame (with much more columns):

id col1 col2 col3 col4 col5
1   4    3    5    4    A
1   3    5    4    9    Z
1   5    8    3    4    H
2   6    9    2    1    B
2   4    9    5    4    K
3   2    1    7    5    J
3   5    8    4    3    B
3   6    4    3    9    C

I want to calculate the standard deviation across specific columns (let's say col2 to col4) grouped by the id. I do not know the column index in every data frame. I only know the names for the columns I want to calculate the standard deviation for.

Is there a way I could do that easily? My original data frames contain around 20 columns and I only want the standard deviation for 10 columns with specific column names grouped by the id.

On top, it would be nice if I can directly add the calculated standard deviations to my data frame as a new column according to the id, looking like this:

id col1 col2 col3 col4 col5 SD
1   4    3    5    4    A   SD1
1   3    5    4    9    Z   SD1
1   5    8    3    4    H   SD1
2   6    9    2    1    B   SD2
2   4    9    5    4    K   SD2
3   2    1    7    5    J   SD3
3   5    8    4    3    B   SD3
3   6    4    3    9    C   SD3
ZayzayR
  • 183
  • 9
  • 1
    `df %>% group_by(id) %>% summarise(across(col2:col4, sd))` If you want to add a new column use `mutate` : `df %>% group_by(id) %>% mutate(across(col2:col4, list(sd = sd)))` – Ronak Shah Jan 25 '21 at 13:08
  • Your answer only gives me the standard deviation for each column by id. But I want one value across all columns for each id. – ZayzayR Jan 25 '21 at 13:17

2 Answers2

2

You can try :

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(SD = sd(unlist(select(cur_data(), col2:col4))))

#    id  col1  col2  col3  col4 col5     SD
#  <int> <int> <int> <int> <int> <chr> <dbl>
#1     1     4     3     5     4 A      2.12
#2     1     3     5     4     9 Z      2.12
#3     1     5     8     3     4 H      2.12
#4     2     6     9     2     1 B      3.41
#5     2     4     9     5     4 K      3.41
#6     3     2     1     7     5 J      2.62
#7     3     5     8     4     3 B      2.62
#8     3     6     4     3     9 C      2.62
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Using data.table

library(data.table)
setDT(df)[,  SD :=   sd(unlist(.SD)), id, .SDcols = col2:col4]
akrun
  • 874,273
  • 37
  • 540
  • 662