4

My question is almost identical to this one except instead of finding the closest value between a column value and a fixed number, e.g. "2", I want to find the closest value to the value in another column.. Here's an example of data:

    df <- data.frame(site_no=c("01010500", "01010500", "01010500","02010500", "02010500", "02010500", "03010500", "03010500", "03010500"), 
                     OBS=c(423.9969, 423.9969, 423.9969, 123, 123, 123, 150,150,150),
                     MOD=c(380,400,360,150,155,135,170,180,140),
                     HT=c(14,12,15,3,8,19,12,23,10))

Which looks like this:

   site_no      OBS MOD HT
1 01010500 423.9969 380 14
2 01010500 423.9969 400 12
3 01010500 423.9969 360 15
4 02010500 123.0000 150  3
5 02010500 123.0000 155  8
6 02010500 123.0000 135 19
7 03010500 150.0000 170 12
8 03010500 150.0000 180 23
9 03010500 150.0000 140 10

The goal is, for every "site_no", find the closest MOD value that matches the OBS value, then return the corresponding HT. For example, for site_no 01010500, 423.9969 - 400 yields the minimum difference, and thus the function would return 12. I have tried most of the solutions from the other post, but get an error due to $ with atomic vector (the df is recursive, but I think the function is not). I tried:

ddply(df, .(site_no), function(z) {
  z[abs(z$OBS - z$MOD) == min(abs(z$OBS - z$MOD)), ]
}) 
Error in z$River_Width..m. - z$chan_width :
  non-numeric argument to binary operator
LauraR
  • 181
  • 1
  • 2
  • 12
  • Perhaps `library(dplyr); df %>% group_by(site_no) %>% slice(which.min(HT))` or may be `df %>% group_by(site_no) %>% slice(which.min(abs(OBS-MOD)))` – akrun Sep 22 '17 at 15:13
  • @akrun, the 2nd option works great. I can accept it as an answer if you submit as one. Only question: the str that is returned is a tibble, and some combo tbl of data frames.. could i get this result in a single data frame? – LauraR Sep 22 '17 at 15:44
  • Thank you. I added my comment as a solution with some explanation. You can convert to data.frame if you want it – akrun Sep 22 '17 at 15:50

1 Answers1

3

After grouping by 'site_no', we slice the rows which has the minimum absolute difference between the 'OBS' and 'MOD'

library(dplyr)
res <- df %>%
         group_by(site_no) %>% 
         slice(which.min(abs(OBS-MOD)))

NOTE: By using dplyr, some additional classes like tbl_df tibble etc. are added which should work with most other functions. If there is any problem, we can convert it to data.frame with as.data.frame

str(res %>%
        as.data.frame)
#'data.frame':   3 obs. of  4 variables:
#$ site_no: Factor w/ 3 levels "01010500","02010500",..: 1 2 3
#$ OBS    : num  424 123 150
#$ MOD    : num  400 135 140
#$ HT     : num  12 19 10
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I have one slight issue with this.. if there is a tie, the operator returns the first value in the index. Ideally, I'd like it to return the max(HT). Can't get this to work with using slice.. tried moving to filter and rank, but no such luck. – LauraR Sep 22 '17 at 18:05
  • 2
    @LauraR In that case, `df %>% group_by(site_no) %>% arrange(site_no, desc(HT)) %>% slice(which.min(abs(OBS-MOD)))` You can check that after changing `df$MOD[5] <- 135` – akrun Sep 22 '17 at 18:07