2

I have a data frame like this:

df <- data.frame(id = c(1,1,1,2,2,3,3,3,3,4,4,4),
                 torre = c("a","a","b","d","a","q","t","q","g","a","b","c"))

and I would like my code to select for each id the torre that repeats more, or the last torre for the id if there isnt one that repeats more than the other, so ill get a new data frame like this:

df2 <- data.frame(id = c(1,2,3,4), torre = c("a","a","q","c"))
Jaap
  • 81,064
  • 34
  • 182
  • 193

4 Answers4

1

You can use aggregate:

aggregate(torre ~ id, data=df,
  FUN=function(x) names(tail(sort(table(factor(x, levels=unique(x)))),1))
)

The full explanation for this function is a bit involved, but most of the job is done by the FUN= parameter. In this case we are making a function that get's the frequency counts for each torre, sorts them in increasing order, then get's the last one with tail(, 1) and takes the name of it. aggregate() function then applies this function separately for each id.

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
1

You could do this using the dplyr package: group by id and torre to calculate the number of occurrences of each torre/id combination, then group by id only and select the last occurrence of torre that has the highest in-group frequency.

library(dplyr)
df %>% 
group_by(id,torre) %>% 
mutate(n=n()) %>% 
group_by(id) %>% 
filter(n==max(n)) %>%
slice(n()) %>% 
select(-n)
     id torre
  <dbl> <chr>
1     1     a
2     2     a
3     3     q
4     4     c
Lamia
  • 3,845
  • 1
  • 12
  • 19
1

An approach with the package:

library(data.table)
setDT(df)[, .N, by = .(id, torre)][order(N), .(torre = torre[.N]), by = id]

which gives:

   id torre
1:  1     a
2:  2     a
3:  3     q
4:  4     c

And two possible alternatives:

library(dplyr)

# option 1
df %>% 
  group_by(id, torre) %>% 
  mutate(n = n()) %>% 
  group_by(id) %>% 
  mutate(f = rank(n, ties.method = "first")) %>% 
  filter(f == max(f)) %>% 
  select(-n, -f)

# option 2
df %>% 
  group_by(id, torre) %>% 
  mutate(n = n()) %>% 
  distinct() %>% 
  arrange(n) %>% 
  group_by(id) %>% 
  slice(n()) %>% 
  select(-n)
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

Yet another dplyr solution, this time using add_count() instead of mutate():

df %>%
  add_count(id, torre) %>% 
  group_by(id) %>% 
  filter(n == max(n)) %>% 
  slice(n()) %>% 
  select(-n)

# A tibble: 4 x 2
# Groups:   id [4]
     id torre
  <dbl> <fct>
1    1. a    
2    2. a    
3    3. q    
4    4. c   
tmfmnk
  • 38,881
  • 4
  • 47
  • 67