Questions tagged [fuzzyjoin]

An R package for joining tables together on inexact matching.

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance, regular expression, or custom matching functions. Uses similar syntax as dplyr's joins.

161 questions
0
votes
0 answers

Fuzzy Matching player names in R

In R, I have two dataframes, one with full names and one with abbreviated names, I want to dplyr join them to see which one has a flag. However, it is very hard to get matched names, even when I match last names, there are same last names. I'm…
0
votes
2 answers

Joining dataframes on text strings using fuzzy string matching (stringdist_join())

I'm trying to join two datasets on based on the values of two variables. Both datasets have the same variable names/number of columns but may have a different number of rows. I want to join them based on a grouping variable ("SampleID") and a…
JRock
  • 1
  • 2
0
votes
0 answers

stringdist_join not merging data

I have three data frames that need to be merged. There are a few small differences between the competitor names in each data frame. For instance, one name might not have a space between their middle and last name, while the other data frame…
bandcar
  • 649
  • 4
  • 11
0
votes
1 answer

How to merge 2 dataframes with partial character strings?

i have a dataset that lists several possible genera of plants, and another dataset that lists all the species with their functional forms. I would like to merge these datasets in such a way that IF the genus listed in df2 is found within the SPP…
salix7
  • 61
  • 5
0
votes
1 answer

Fuzzy join on substring dask

I have two data frames with columns of interest 'ParseCom', which is the left index of this fuzzy join, and 'REF' which should be a substring of 'ParseCom' during a join. This is iterating over the Dataframe, which is not recommended. How can I…
Isaacnfairplay
  • 217
  • 2
  • 18
0
votes
0 answers

Join tables based on one column exact match and other columns fuzzy matches excel

I have two tables where I want to match age and height to the percentile they fall within (according to WHO guidelines). So if the ages in table_percentile and table_height match, find the percentile column in table_percentile that the height falls…
0
votes
3 answers

R - Fuzzy Inner Join on two fields, matching to a date range

I'm fairly new to R, and have been sifting through other questions all morning trying to figure this out, but can't find anything related enough or my knowledge of R is not good enough to understand some of the suggested solutions to my problem. I…
P Meddyyy
  • 15
  • 4
0
votes
2 answers

Merge two data frames in R by variable that is regular expression in one and string in other

I have two data frames I would like to merge a<- data.frame(x=c(1,4,6,8,1,6,7,2),ID=c("132","14.","732","2..","132","14.","732","2.."),year=c(1,1,1,1,2,2,2,2)) b<-…
mclofa
  • 33
  • 5
0
votes
0 answers

Using fuzzy join to insert one column from dataframe to another dataframe and match by a column in btoh dataframes

I have been trying to use the fuzzy join package to join the "conservation status" column from the con_filtered_report_groups data frame to the report_groups_order dataframe that has the rest of the information i am wanting to use. I want to join…
rhelp
  • 1
  • 1
0
votes
1 answer

Complex join between two dataframes

I am working on a very advanced join of dataframes that is complex for me. I would like to ask you for some help if possible. I have two dataframes, df1 and df2 which I include at the end as dput(). My first dataframe df1 looks like this: df1 …
Duck
  • 39,058
  • 13
  • 42
  • 84
0
votes
0 answers

Have R warn me when a match from Fuzzy Join is too far off

I previously asked a question here about how to use R to automatically "spellcheck" a big list of department names before I export a file and send it off. (Same data can be used as reproducible example) The solution of using Fuzzy Join worked…
Joe Crozier
  • 944
  • 8
  • 20
0
votes
0 answers

Fuzzyjoin regex being very slow & running out of memory

I want to join two dfs with fuzzyjoin::regex_left_join(df1,df2, by=c(name="name") where df1 has 45k rows, and df2 has 2.5mil. This results in a memory error. If I split df1 up into chunks of 1000 rows, each chunk takes 15 minutes to run. It turned…
Spine Feast
  • 235
  • 1
  • 11
0
votes
0 answers

Fuzzy matching Countries in R

for an assignment I have to use fuzzy matching in R to merge two different datasets that both had a "Country" column. The first dataset is from Kaggle(Countries dataset) while the other is from ISO 3166 standard. I already use fuzzy matching it…
0
votes
0 answers

Is there a way to detect in r if two strings have some characters in common?

Reproducable example: library(fuzzyjoin) library(stringr) df1 <- data.frame(x = c("Victoria Park Ave N & Pachino Blvd, Toronto, ON", "The West Mall S & The Queensway, Toronto, ON", "Willowdale Ave NS…
Dinesh
  • 391
  • 2
  • 9
0
votes
0 answers

Split data by number of rows, fuzzy match with another dataset and then join all fuzzy matches together in R

I'm trying to fuzzy match rows across different dataframes/datatables, based on the name of the company variable. I've matched a big chunk of these through standard joins and the use of some regex to remove words (such as Limited, Ltd) etc., but I'm…