I've the following 2 dataset: The original dataset and the matching dataset. The matching dataset is used to match against the original dataset.
Original Dataset |
---|
MelbourneAir |
MelbourneCityAir |
AirMelbourne |
LondonAir |
londonterminal |
Airportlondon |
NewyorkAirport |
Airnewyork |
newyorkterminal |
airnorway |
terminalnorway |
swedenair |
airsweden |
swedenbus |
shopbanana |
bananashop |
bananaashop |
appleshop |
shopapple |
Original_df = structure(list(df.Original.Data = structure(c(11L, 12L, 1L, 9L,
10L, 4L, 13L, 2L, 14L, 15L, 3L, 20L, 18L, 5L, 19L, 17L, 8L, 7L,
6L, 16L), .Label = c("AirMelbourne", "airnewyork", "airnorway",
"AirportLondon", "airsweden", "appleshop", "bananaashop", "bananashop",
"LondonAir", "LondonTerminal", "MelbourneAir", "MelbourneCityAir",
"newyorkairport", "newyorkterminal", "norwayterminal", "shopapple",
"shopbanana", "swedenair", "swedenbus", "terminalnorway"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
Matching Dataset |
---|
MelbourneAirport |
LondonTerminal |
NewYorkAirport |
NorwayTerminal |
SwedenAirport |
BananaShop |
AppleShop |
Matching_df = structure(list(df.Matching.Data = structure(c(5L, 4L, 6L, 7L,
8L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("", "AppleShop", "BananaShop", "LondonTerminal",
"MelbourneAirport", "NewYorkAirport", "NorwayTerminal", "SwedenAirport"
), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
head(matching_df,7)
Does anyone know how I can use R and it's packages (stringdist, stringr, stringi, tidyr, dplyr or base R) to do the matching based on consecutive letters (i.e. as long as the text in original dataset matches the text in the matching dataset by 5(or 6) consecutive letters, R will match it accordingly to the following table? I've been using Excel so far to do the matching but given the limit to how much data excel can handle, using R would be more efficient.
Original Dataset | Matched Dataset |
---|---|
MelbourneAir | MelbourneAirport |
MelbourneCityAir | MelbourneAirport |
AirMelbourne | MelbourneAirport |
LondonAir | LondonTerminal |
londonterminal | LondonTerminal |
Airportlondon | LondonTerminal |
NewyorkAirport | NewYorkAirport |
Airnewyork | NewYorkAirport |
newyorkterminal | NewYorkAirport |
airnorway | NorwayTerminal |
terminalnorway | NorwayTerminal |
swedenair | SwedenAirport |
airsweden | SwedenAirport |
swedenbus | SwedenAirport |
shopbanana | BananaShop |
bananashop | BananaShop |
bananaashop | BananaShop |
appleshop | AppleShop |
shopapple | AppleShop |
Thanks in advance!