I'm working with package version 0.4-12 and R version 4.0.0
My data linkage code that I have used in the past in no longer running the same as it did when I had R version 3.6.3
library(tidyverse)
library(RecordLinkage)
data("RLdata500")
data("RLdata10000")# Creating package datasets to link; dat1 and dat2
dat1 <- RLdata500
dat2 <- bind_rows(RLdata500, RLdata10000)
The code for the two linkages below are identical except for the strcmpfun argument which is either set to "jarowinkler" or "levenshtein."
The "levenshtein" code runs fine, but the jarowinkler" linkage fails to produce any results for "allpairs_jw."
# Jaro-Winkler with Package data
rpairs <- RLBigDataLinkage(dat1, dat2,
strcmp = TRUE,
strcmpfun = "jarowinkler",
exclude = c("fname_c2", "lname_c2"))
epi <- epiWeights(rpairs)
allpairs_jw <- getPairs(epi, min.weight = 0.80)
# Levenshtein with Package data
rpairs <- RLBigDataLinkage(dat1, dat2,
strcmp = TRUE,
strcmpfun = "levenshtein",
exclude = c("fname_c2", "lname_c2"))
epi <- epiWeights(rpairs)
allpairs_lv <- getPairs(epi, min.weight = 0.80)
> head(allpairs_jw)
[1] id fname_c1 fname_c2 lname_c1 lname_c2 by bm bd is_match
<0 rows> (or 0-length row.names)
> head(allpairs_lv)
id fname_c1 fname_c2 lname_c1 lname_c2 by bm bd is_match Weight
1 1 CARSTEN <NA> MEIER <NA> 1949 7 22
2 1 CARSTEN <NA> MEIER <NA> 1949 7 22 <NA> 1.0000000
3
4 2 GERD <NA> BAUER <NA> 1968 7 27
5 2 GERD <NA> BAUER <NA> 1968 7 27 <NA> 1.0000000
6
Any guidance would be greatly appreciated