Questions tagged [fuzzyjoin]

An R package for joining tables together on inexact matching.

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance, regular expression, or custom matching functions. Uses similar syntax as dplyr's joins.

161 questions
2
votes
1 answer

fuzzy outer join/merge in R

I have 2 datasets and want to do fuzzy join. Here is the two datasets. library(data.table) # data1 dt1 <- fread("NAME State type ABERCOMBIE TOWNSHIP ND TS ABERDEEN TOWNSHIP NJ TS …
Peter Chen
  • 1,464
  • 3
  • 21
  • 48
2
votes
0 answers

Inner join with two reactive dataframes shiny

I'm developing a R Studio Shiny app, the logic consist in load two excel files into dataframes and using fuzzyjoin package to make a inner join between these dataframes, below is the code of my shiny.r and server,r, loading of excel files are…
2
votes
2 answers

Partial string matching in R and trim the characters

Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "mnopqr", "qrst")) vec <- c("ab", "mnop", "ijk") Now, for all the values in var1 that matches closest (I would like to match the first n characters) with the values…
Geet
  • 2,515
  • 2
  • 19
  • 42
2
votes
2 answers

Fuzzy join without proc SQL

Good day, I wish to merge two dates to next closest. Datasets are huge 500Mb to 1G so proc sql is out of the question. I have two data sets. First (Fleet) has observations, second has date and which generation number to use for further processing.…
pinegulf
  • 1,334
  • 13
  • 32
2
votes
1 answer

fuzzy join with permutations in strings

I'm using fuzzyjoin to cross politicians and their respective regions: library(dplyr) library(fuzzyjoin) x <- tibble(name = c("Fulvio Rossi Ciocca", "Rigoberto Del Carmen Rojas Sarapura", "Lorena Vergara Bravo", "Lily Perez San Martin"), …
pachadotdev
  • 3,345
  • 6
  • 33
  • 60
1
vote
2 answers

R: How to left join two tables based on fuzzy matching strings that are not exactly the same

I am trying to left join table 1 'Person Name' to table 2 'Name' and get the values from the Work Group column in Table 2 df1 <- read.table(text=" Person_Name PEREZ, MINDY PEREZ, ABA CLARKE, LINDA THOMAS, NICOLE", header=T, sep="|") df2 <-…
Pxanalyst
  • 43
  • 1
  • 5
1
vote
1 answer

Return anti-join of two data frames with values outside a certain percentage difference

I would like to compare two mixed-type data frames and return the rows that are different between them--but I would like numeric values to only be returned within a certain percentage. tbl1 <- tibble(var1 = c('r1', 'r2', 'r3', 'r4', 'r5'), …
JemJem
  • 25
  • 6
1
vote
0 answers

Fixing fuzzyjoin error message: vector memory exhausted

I'm trying to join two data sets using fuzzy matching through the stringdist_left_join function from the library fuzzy join, but I keep getting the error message "Error: vector memory exhausted (limit reached?)." Does anybody know why this may be…
1
vote
0 answers

Data consolidation and cleaning using fuzzy string comparisons with -matchit- command

I have two databases, one designated data and another data1 (reference), where I want to compare the codes of each data designation and data2, I have to do it by writing the designations, if they are written the same or similar, I have to have the…
1
vote
2 answers

Join tables with inexact match in R. Only match if a whole word matches

I have a problem that can be reproduced in the following way: library(tidyverse) a <- tibble(navn=c("Oslo kommune", "Oslo kommune", "Kommunen i Os", "Kommunen i Eidsberg", "Eid fylkeskommune"), person=c("a", "c", "b", "a", "b")) b <-…
Ajern
  • 11
  • 3
1
vote
0 answers

confused about multi_by and multi_match_fun in R fuzzy_join

Can someone help me understand what "multi_by" and "multi_match_fun" actually do in comparison to "by" and "match_fun" in the R package fuzzyjoin? This is from the package docs (v0.1.6) by Columns of each to join match_fun …
tospo
  • 646
  • 9
  • 17
1
vote
1 answer

interval join with extra key

I would like to do an interval join with an additional key. The simplest way in dplyr is quite slow intervalDf <- tibble(id = rep(seq(1, 100000, 1), 10), k1 = rep(seq(1, 1000, 1), 1000), startTime =…
blahblah4252
  • 105
  • 4
1
vote
1 answer

regex_left_join (fuzzyjoin) not working as expected

I am trying to perform a join in R based on a regex pattern from one table. From what I understand, the fuzzyjoin package should be exactly what I need, but I can't get it to work. Here is an example of what I'm trying to…
Nick Brown
  • 55
  • 6
1
vote
1 answer

Partial matching in R

Is there a way I can partially match the two data frames in R? df1<-data.frame("FIDELITY FREEDOM 2015 FUND", "ID") df2<-data.frame("FIDELITY ABERDEEN STREET TRUST: FIDELITY FREEDOM 2015 FUND", 2020) I want to merge df1 and df2 as…
Jane
  • 91
  • 4
1
vote
1 answer

fuzzy joining a column with a list

The data is as follows: library(fuzzyjoin) nr <- c(1,2) col2 <- c("b","a") dat <- cbind.data.frame( nr, col2 ) thelist <- list( aa=c(1,2,3), bb=c(1,2,3) ) I would like to the following: stringdist_left_join(dat, thelist, by = "col2", method =…
Tom
  • 2,173
  • 1
  • 17
  • 44