2

I have the following data:

data <- data.frame(id_pers=c(1, 2, 3, 4, 5),
                       Birthyear=c(2018, 2009, 2008, 2000, 1998,2005),
                       family=c(Elliot, Elliot, Elliot, Gerrard, Gerrard,Gerrard)
                   

I want to find the maximal difference (in birthyear) in each family, that is the same for all the family-members in the following.

It should look like:

datanew <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6),
                       Birthyear=c(2018, 2009, 2008, 2000, 1998, 2005),
                       family=c(Elliot, Elliot, Elliot, Gerrard, Gerrard, Gerrard),
                       maxdifference=c(10,10,10,7,7,7)
Max Herre
  • 47
  • 5

4 Answers4

2

Using tidyverse you can first group by family ID, then compute the distance via dist and take the maximum max.

library(tidyverse)
data <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6),
                   Birthyear=c(2018, 2009, 2008, 2000, 1998,2005),
                   family=c(1, 1, 1, 2, 2,2))

data %>% dplyr::group_by(family) %>%
  dplyr::mutate(maxdifference = max(dist(Birthyear)))
# A tibble: 6 × 4
# Groups:   family [2]
  id_pers Birthyear family maxdifference
    <dbl>     <dbl>  <dbl>         <dbl>
1       1      2018      1            10
2       2      2009      1            10
3       3      2008      1            10
4       4      2000      2             7
5       5      1998      2             7
6       6      2005      2             7
tacoman
  • 882
  • 6
  • 10
2

Another way is to take the difference of the range:

data %>% 
  group_by(family) %>% 
  mutate(maxdifference = diff(range(Birthyear)))
Maël
  • 45,206
  • 3
  • 29
  • 67
2
data %>% group_by(family) %>% mutate(maxdifference = max(Birthyear)-min(Birthyear))
KacZdr
  • 1,267
  • 3
  • 8
  • 23
1

Obligatory base-r one-liner

data$maxdifference = ave(data$Birthyear, data$family, FUN = \(years) max(years) - min(years))
Ottie
  • 1,000
  • 3
  • 9