I need merge two datasets
df1
df1=structure(list(id = structure(c(1L, 4L, 5L, 6L, 2L, 3L), .Label = c("195/75 R16C-Tire CORDIANT Business CA",
"215/75 R17,5-Tires KAMA NR-201 driving axle", "235/70 R16-Tire KAMA-221",
"275/70 R22,5-Tire TYREX ALL STEEL VC-1 (Я-646)", "315/80 R22,5-Tire TYREX ALL STEEL DR-1 driving axle",
"315/80 R22,5-Tire TYREX ALL STEEL FR-401 steering axle"), class = "factor")), .Names = "id", class = "data.frame", row.names = c(NA,
-6L))
df2
df2= structure(list(id = structure(c(2L, 4L, 5L, 6L, 3L, 1L), .Label = c("Auto-cutting 245 / 70R16 K-214",
"Auto-rubber 195/75 R16C Cordiant Business CA 107 / 105R all-season",
"Auto-rubber 215 / 75R17,5 K-166", "Auto-rubber 275 / 70R22,5 (11 / 70R22,5) I-646 (Tyrex all steel VC-1)",
"Auto-rubber 315 / 80R22,5 DR-1Tyrex All Steel (Я-636)", "Auto-rubber 315 / 80R22,5 FR-401 Tyrex All Steel (Я-626)"
), class = "factor")), .Names = "id", class = "data.frame", row.names = c(NA,
-6L))
I use fuzzylogic
library("RecordLinkage")
#get weights
rpairs_jar <- compare.linkage(df1, df2,
strcmp = c("id"),
strcmpfun = jarowinkler)
rpairs_epiwt <- epiWeights(rpairs_jar)
#get wright to data frame
b=rpairs_epiwt$pairs
View(b)
On output i see
we have weights between all id. For example, the weight id1 is calculated with respect to all 6 denominations. But we see that the greatest weight between the first item of df1(id1) and the first item of df2 (id1) (0,61).
What about second item(id2 of df1) the greatest weight between third item(id3 of df2)(0.58).
How to leave only those comparisons, between the id of which are the greatest weight?
I.E on output, we have table not with 36 entries, but six
id1 id2 id
1 1 0,6106743
2 3 0,5994314
3 3 0,5874915
4 4 0,6288133
5 4 0,5552018
6 6 0,5642857