I have a column in a data.table (DT1)
that is a decreasing count (let's say pol_count
), and another column for the average age in a population (say AverageAge
). I am trying to take my value of pol_count
(say 400) and my value of AverageAge
(say 85) and match them to an interval within a table called FactorFile
.
Here is a glimpse of FactorFile
:
Count_Greater_Than Age_Less_Than Months Factor
100 80 12 1
85 82 16 0.85
65 84 20 0.65
45 86 24 0.45
In this instance, pol_count
is > 100 so it would return the third column's (Months
) value of 12, and AverageAge
is < 86 but > 84 so it would return 20. I then need another column to take the max, so final answer is 20. Finally, a column for the Factor
associated with 20 which is 65%.
I'm not sure how to join on a table without using exact matches. Basically I want a VLOOKUP with the last argument as TRUE, to pull the closest value to the supplied lookup value.
I've tried using the package fuzzyjoin
, here is my statement. It runs, but nothing seems to happen to my data.table:
fuzzy_left_join(DT1, FactorFile, by = c("AverageAge" = "Average_Age_Less_Than"), match_fun = `<=`)
Any tips on how to use fuzzyjoin correctly or an easier way to make this join happen? To summarize, I'm trying to take two columns in DT1, and return both a Months column and Factor column onto DT1, by joining on FactorFile.
Thanks!