Questions tagged [fuzzyjoin]

An R package for joining tables together on inexact matching.

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance, regular expression, or custom matching functions. Uses similar syntax as dplyr's joins.

161 questions
1
vote
2 answers

"fuzzy" inner_join in dplyr to keep both rows that do AND not exactly match

I am working with two datasets that I would like to join based not exact matches between them, but rather approximate matches. My question is similar to this OP. Here are examples of what my two dataframes look like. df1 is this one: x 4.8 12 …
Blundering Ecologist
  • 1,199
  • 2
  • 14
  • 38
1
vote
2 answers

Match two tables based on a time difference criterium

I have a data table (lv_timest) with time stamps every 3 hours for each date: # A tibble: 6 × 5 LV0_mean LV1_mean LV2_mean Date_time Date 1 0.778 -4.12 …
Lisa
  • 81
  • 8
1
vote
0 answers

need to compare addresses across several columns and return the most complete - fuzzy join, etc

I have two files of customer records I need to merge. The addresses don't always match because they are not the same address, or there is part missing. I have tried fuzzy join without success. I basically need to compare 4 columns and assemble…
1
vote
0 answers

fuzzyjoin based on relative difference

I have understood that fuzzyjoin::difference will join two tables based on absolute difference between columns. Is there an R function that will join tables based on relative/percentage differences? I could do so using a full_join() + filter() but I…
Yonghao
  • 166
  • 6
1
vote
1 answer

R function to join two tables if date in table 1 is earlier than date in table 2

this question is about Tibbles in tidyverse package of R. I have created the below example to represent my data . Tibble 'ab' is a list of people (column a) and a date that something specific happened (column b) e.g. received a vaccine. Tibble 'cd'…
Luke
  • 11
  • 1
1
vote
1 answer

Merging two data frame based on maximum numbers of words in commonin R

I have two data.frame one containing partial name and the other one containing full name as follow partial <- data.frame( "partial.name" = c("Apple", "Apple", "WWF", "wizz air", "WeMove.eu", "ILU") full <- data.frame("full.name" = c("Apple Inc",…
JMCrocs
  • 77
  • 7
1
vote
2 answers

How to fuzzy match by words (not letters) in R?

I need to merge two datasets based on columns that contain names that don't exaclty match, sometimes because one of the columns has a missing name with respect to the other. For example, in one column I have "Martín Gallardo" and in the other I have…
Martin
  • 307
  • 1
  • 10
1
vote
1 answer

fuzzy_left_join with match_fun %in%

Some data example_df <- data.frame( url = c('blog/blah', 'blog/?utm_medium=foo', 'blah', 'subscription/apples', 'UK/something'), numbs = 1:5 ) lookup_df <- data.frame( string = c('blog', 'subscription', 'UK'), group = c('blog', 'subs',…
Doug Fir
  • 19,971
  • 47
  • 169
  • 299
1
vote
1 answer

How to fuzzyjoin several dataframes in one go using IRanges

I need to join several dataframes based on inexact matching, which can be achieved using the fuzzyjoin and the IRanges packages: Data: df1 <- data.frame( line = 1:4, start = c(75,100,170,240), end = c(100,150,190,300) ) df2 <- data.frame( v2…
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1
vote
0 answers

Is there a way to set "rules" for string matching in R?

I've been scratching my head trying to find a way to solve this problem without having to get into NLP and start training models. I have 2 rather large data sets that should be able to be matched by name, but the spellings and syntax of them are…
HFguitar
  • 11
  • 2
1
vote
1 answer

Fuzzy Join Error: All columns in a tibble must be vectors

test <- structure(list(trip_count = 1:10, dropoff_longitude = c(-73.959862, -73.882202, -73.934113, -73.992203, -74.00563, -73.975189, -73.97448, -73.974838, -73.981377, -73.955093), dropoff_latitude = c(40.773617, 40.744175, 40.715923,…
Maximilian
  • 89
  • 1
  • 7
1
vote
1 answer

Limiting fuzzy join calculations

I'm trying to execute an event study that evaluates whether or not a specific individual participates in a specific event (event P) after experiencing a specific treatment (treatment E). I'm doing this by taking the observations of event E, and…
EconMatt
  • 339
  • 2
  • 7
1
vote
1 answer

Fuzzy matching (and overwriting) vector entries

I have 5 vectors with column names, which are similar, but not identical. I am trying to find a way to correct the entries in vector2, vector3, vector4, vector5, based on the names in vector1. I have been getting some ideas here and here, leading to…
Tom
  • 2,173
  • 1
  • 17
  • 44
1
vote
1 answer

R - Splitting a large dataframe into several smaller dateframes, performing fuzzyjoin on each one and outputting to a single dataframe

I have 2 dataframes, which I need to join using the fuzzyjoin function. I've tried performing the function on the whole dataframes but do not have enough memory to do so. One of the dataframes [UPRN] acts as source data holding a unique identifier…
daenwaels
  • 85
  • 1
  • 7
1
vote
2 answers

fuzzy and exact match of two databases

I have two databases. The first one has about 70k rows with 3 columns. the second one has 790k rows with 2 columns. Both databases have a common variable grantee_name. I want to match each row of the first database to one or more rows of the…