Find nearest highest value or equivalent based on multiple variables

Question

I have the following three dataframes:

df1
   grade PIR  rate
1      7 min 10.80
2      8 min 11.26
3      9 min 12.10
4      7  X1 11.17
5      8  X1 11.65
6      9  X1 12.56
7      7  X2 11.55
8      8  X2 12.06
9      9  X2 13.03
10     7  X3 11.95
...

and

df2
   grade PIR new_rate
1      7 min    13.00
2      8 min    13.00
3      9 min    13.00
4      7  X1    13.48
5      8  X1    13.48
6      9  X1    13.48
7      7  X2    13.98
8      8  X2    13.98
9      9  X2    13.98
10     7  X3    14.50
...

and

df3
ID      assoc_rate   assoc_grade
124575  10.80        7
123413  11.42        7
111539  11.65        8
112284  12.04        8
125245  12.10        9
132588  12.44        9
....

For each case of df3$assoc_rate and df3$assoc_grade I need to identify the nearest highest value (or equivalent) of df1$rate within the same df1$grade, in order to join the df1$grade and df1$PIR, which would then allow me to attach the df2$new_rate based on grade and PIR.

My end goal is to attach the new_rate and PIR from df2 to df3 based on their assoc_rate and assoc_grade, but it has to be in relation to df1.

So my desired dataframe would look like:

df_desired
ID      assoc_rate   assoc_grade   PIR   new_rate 
124575  10.80        7             min     13.00      
123413  11.42        7             X2      13.98     
111539  11.65        8             X1      13.48     
112284  12.04        8             X2      13.98     
125245  12.10        9             min     13.00     
132588  12.44        9             X1      13.48 
....

Here are the dataframes:
df1 <- data.frame(grade = c(7L, 8L, 9L, 7L, 8L, 9L, 7L, 8L, 9L,7L, 8L, 9L, 7L, 8L, 9L), PIR = c("min", "min", "min", "X1", "X1","X1", "X2", "X2", "X2", "X3", "X3", "X3", "X4", "X4", "X4"),rate = c(10.8, 11.26, 12.1, 11.17, 11.65, 12.56, 11.55, 12.06, 13.03, 11.95, 12.49, 13.53, 12.35, 12.93, 14.04))
df2 <- data.frame(grade = c(7L, 8L, 9L, 7L, 8L, 9L, 7L, 8L, 9L,7L, 8L, 9L, 7L, 8L, 9L), PIR = c("min", "min", "min", "X1", "X1","X1", "X2", "X2", "X2", "X3", "X3", "X3", "X4", "X4", "X4"),new_rate = c(13, 13, 13, 13.48, 13.48, 13.48, 13.98, 13.98,13.98, 14.5, 14.5, 14.5, 15.04, 15.04, 15.04))
df3 <- data.frame(ID = c(124575, 123413, 111539, 112284, 125245, 132588), assoc_rate = c(10.80,11.42,11.65,12.04,12.10,12.44), assoc_grade = c(7,7,8,8,9,9))

Thank you for any insight or instruction. (I tried something like this but didn't know how to make it work.)

score 1 · Accepted Answer · answered Feb 12 '20 at 17:44

Using tidyverse

df3 %>% 
left_join(df1, by = c("assoc_grade" = "grade")) %>% 
mutate(diff     = rate - assoc_rate, 
       new_diff = ifelse(diff < 0, 1000, diff)) %>% 
group_by(ID) %>% 
filter(new_diff == min(new_diff)) %>% 
ungroup() %>% 
left_join(df2, by = c("assoc_grade" = "grade", "PIR" = "PIR"))

join and filter are the steps that are necessary. Using a group_by before filtering lets you filter for each ID.

score 1 · Answer 2 · answered Feb 13 '20 at 00:54

Since you referenced a question with a data.table answer, here is an option using data.table:

library(data.table)
setDT(df1); setDT(df2); setDT(df3)

#rolling join
df3[, PIR :=
    df1[.SD, on=.(grade=assoc_grade, rate=assoc_rate), roll="nearest", PIR]
]

#update join
df3[df2, on=.(assoc_grade=grade, PIR), new_rate := new_rate]

output:

       ID assoc_rate assoc_grade PIR new_rate
1: 124575      10.80           7 min    13.00
2: 123413      11.42           7  X2    13.98
3: 111539      11.65           8  X1    13.48
4: 112284      12.04           8  X2    13.98
5: 125245      12.10           9 min    13.00
6: 132588      12.44           9  X1    13.48

Aimé Okoko · Answer 3 · 2020-02-12T19:05:30.587

Create a function to locate nearest highest or equivalent

Inspired from link How to find first element of a group that fulfill a condition

first_equal_sup <- function(x, value){
(x >= value) & (cumsum(x >= value) == 1)}

Get all matching rows from df1 in a list

j<- 1

x <- list()

for (i in df3$assoc_rate){

#Extract the group in df3

df3_group <- df3[df3$assoc_rate == i, "grade"]

#Extract all lines with corresponding group

df1_group <- df1[df1$grade == df3_group,]

#Put each corresponding line in a list and join with df2

x[[j]] <- df1_group[first_equal_sup(df1_group$rate, i),]

j <- j+1
}

df <- Reduce(rbind, x) %>% left_join(df2, by=c("grade", "PIR")) %>%
right_join(df3[,c("ID", "grade", "PIR")])

score 0 · Answer 4 · answered Feb 12 '20 at 19:09

Thanks Akash87 for his answer. But I think it has few syntax errors Rewriting it like that will match exactly with your example

df3 %>% 
  left_join(df1, by = c("grade" = "grade")) %>% 
  mutate(diff     = rate - assoc_rate, 
         new_diff = ifelse(diff < 0, 1000, diff)) %>% 
  group_by(ID) %>% 
  filter(new_diff == min(new_diff)) %>% 
  ungroup() %>% 
  left_join(df2, by = c("grade" = "grade", "PIR.x" = "PIR"))

Find nearest highest value or equivalent based on multiple variables

4 Answers4

Create a function to locate nearest highest or equivalent

Get all matching rows from df1 in a list