2

Using tidyr, how can I create a new column through a group-by and calculation?

For example, if I have this dataframe:

name <- c("a", "a", "a", "a", "b", "b", "b", "b")
x1 <- c(0, 0, 0, 0, 1, 1, 1, 1)
x2 <- c(15, 15, 15, 15, 15, 15, 15, 15)
y <- c(1, 2, 1, 2, 1, 2, 1, 2)
z <- c(50, 100, 40, 90, 65, 95, 40, 95)

df <- data.frame(name, x1, x2, y, z)

Let's say I want to (1) group-by x1 and x2; (2) find the max z value in that group; and (3) create a new column z2 that normalized z by that maximum.

enter image description here

So in this case, the expected output for z2 is c(0.5, 1, 0.4, 0.9, 0.684, 1, 0.421, 1).

a11
  • 3,122
  • 4
  • 27
  • 66
  • 1
    Try `df %>% group_by(x1, x2) %>% mutate(z2 = (z- max(z))/z)` – akrun Nov 01 '22 at 18:09
  • @akrun wow, thank you, other than minor z2 calc change (`df %>% group_by(x1, x2) %>% mutate(z2 = (z/max(z)))`) that is it. I'll accept it as an answer if you write it up – a11 Nov 01 '22 at 18:12

1 Answers1

0

We could simply group by 'x1', 'x2' and create the column with mutate

library(dplyr)
df <- df %>%
    group_by(x1, x2) %>%
    mutate(z2 = (z/max(z, na.rm = TRUE))) %>%
    ungroup

-output

df
# A tibble: 8 × 6
  name     x1    x2     y     z    z2
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a         0    15     1    50 0.5  
2 a         0    15     2   100 1    
3 a         0    15     1    40 0.4  
4 a         0    15     2    90 0.9  
5 b         1    15     1    65 0.684
6 b         1    15     2    95 1    
7 b         1    15     1    40 0.421
8 b         1    15     2    95 1    
akrun
  • 874,273
  • 37
  • 540
  • 662