I have two data frames with text data about users:
x <- data.frame("Address_line1" = c("123 Street","21 Hill drive"),
"City" = c("Chicago","London"), "Phone" = c("123","219"))
y <- data.frame("Address_line1" = c("461 road","PO Box 123","543 Highway"),
"City" = c("Dallas","Paris","New York" ), "Phone" = c("235","542","842"))
> x
Address_line1 City Phone
1 123 Street Chicago 123
2 21 Hill drive London 219
> y
Address_line1 City Phone
1 461 road Dallas 235
2 PO Box 123 Paris 542
3 543 Highway New York 842
For each row of the x dataframe, I want to iterate over all the rows in y, compare the corresponding columns (address to address, city to city etc.) and obtain the string distance for each.
So for the first row of x, I want an output like:
[16 20 20]
Where 16 is
stringdist("123 Street","461 road", method = "lv")+
stringdist("Chicago","Dallas", method = "lv")+
stringdist("123","235", method = "lv")
20 is the sum for second row and 20 for third.
Similarly, I want a list containing nrow(y)
elements for each row of x.