find value closest to x by group in dplyr

Question

library(dplyr)
a <- data_frame(id = c("A","A","A","B","B","B"),
                b = c(1.2, 1.5, 1.8, 1.1, 1.6, 1.4))

Now, I´d like to retrieve the values closest to 1.43 for each of the catergories in id. I thought I could use:

a %>% group_by(id) %>% nth(which.min(abs(.$b-1.43)))

but dplyr states

Error: Don't know how to generate default for object of class grouped_df/tbl_df/tbl/data.frame

Psidom · Accepted Answer · 2016-11-28T01:07:00.313

which.min() returns the index of the (first) minimum or maximum of a numeric (or logical) vector. If there are multiple equal values as close to 1.43 as each other and you want to keep all of them, you can use filter():

a %>% group_by(id) %>% filter(abs(b - 1.43) == min(abs(b - 1.43)))

#Source: local data frame [2 x 2]
#Groups: id [2]

#     id     b
#  <chr> <dbl>
#1     A   1.5
#2     B   1.4

If you prefer sticking with the nth() function, and it is OK to have only one value for each group, you can wrap it within a summarize function so that it will be applied to each group, and also according to ?nth(), you need to pass the vector to the function as an argument as well:

a %>% group_by(id) %>% summarise(b = nth(b, which.min(abs(b-1.43))))

# A tibble: 2 × 2
#     id     b
#  <chr> <dbl>
#1     A   1.5
#2     B   1.4

score 10 · Answer 2 · edited May 23 '17 at 12:13

10

There are a few ways to do this.

Here's a dplyr solution (found using this answer):

a %>%
    group_by(id) %>%
    slice(which.min(abs(b - 1.43)))

     id     b
  <chr> <dbl>
1     A   1.5
2     B   1.4

Here's a base solution:

do.call('rbind', by(a, a$id, function(x) x[which.min(abs(x$b - 1.43)), ]))

     id     b
  <chr> <dbl>
1     A   1.5
2     B   1.4

Here's a hacky dplyr solution:

a %>%
    mutate(AbsDiff = abs(b - 1.43)) %>%
    group_by(id) %>%
    mutate(AbsDiff_r = rank(AbsDiff, ties.method = 'first')) %>%
    filter(AbsDiff_r == 1)

     id     b AbsDiff AbsDiff_r
  <chr> <dbl>   <dbl>     <int>
1     A   1.5    0.07         1
2     B   1.4    0.03         1

edited May 23 '17 at 12:13

Community

1
1

answered Nov 28 '16 at 00:35

bouncyball

10,631
19
31

If you want to avoid the `do.call(rbind...` for the base version, you can also do: `a[by(a, a$id, FUN=function(SD) rownames(SD)[which.min(abs(SD$b - 1.43))] ),]` – thelatemail Nov 28 '16 at 01:56
1

Love the dplyr `slice`! – TheSciGuy Dec 10 '19 at 23:32

dww · Answer 3 · 2016-11-28T00:43:23.930

Not too far from what you had

a %>% group_by(id) %>% summarise(which.min(abs(b-1.43)))
# A tibble: 2 × 2
#      id `which.min(abs(b - 1.43))`
#   <chr>                      <int>
# 1     A                          2
# 2     B                          3

Or if you need the values, rather than the indices:

a %>% group_by(id) %>% summarise(b[which.min(abs(b-1.43))])
# A tibble: 2 × 2
#      id `b[which.min(abs(b - 1.43))]`
#   <chr>                         <dbl>
# 1     A                           1.5
# 2     B                           1.4

score 3 · Answer 4 · answered Nov 28 '16 at 01:43

3

Here is a version with data.table

library(data.table)
setDT(a)[, .(b= b[which.min(abs(b-1.43))]) , id]
#  id   b
#1:  A 1.5
#2:  B 1.4

answered Nov 28 '16 at 01:43

akrun

874,273
37
540
662

find value closest to x by group in dplyr

4 Answers4

Linked