Questions tagged [fuzzyjoin]

An R package for joining tables together on inexact matching.

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance, regular expression, or custom matching functions. Uses similar syntax as dplyr's joins.

161 questions
1
vote
1 answer

Comparing each row from one data frame with each row of another one in the tidyverse

I need to compare each row from one dataframe to each row of another one: id first_name last_name account_nr amount currency comment 1 wW3A4QgpQQd Lynnett Labadini ES46 2569 1625 6669 5490 4624…
Dmytro Fedoriuk
  • 331
  • 3
  • 11
1
vote
0 answers

fuzzy match and extract strings from a string vector to complete a dataframe

I have a liste of french names with some small syntaxic differences. names <- c("Benoit", "Arnoud (son)", "Arnoud", "Arnous", "Archer, Patrice*", "Archer", "Archer (father)", "André" ) "Arnoud (son)", "Arnoud", "Arnous" all these names belong to…
Wilcar
  • 2,349
  • 2
  • 21
  • 48
1
vote
1 answer

Joint by date range and ID ,panel data

I have the basic fund data,I want to add the manager name by the date range and the fund ID I tried the fuzzy right join x = fuzzy_right_join(manager, fundret, by = c("fundName" = "fundName", "date"= "managerStartdate", "date" = "managerENDdate"),…
1
vote
1 answer

Match text strings containing quotation marks which are encoded differently

I have two data frames containing the same information. The first contains a unique identifier. I would like to user dplyr::inner_join to match by title. Unfortunately, one of the data frames contains {"} to signify a quote and the other simply…
user25494
  • 1,289
  • 14
  • 27
1
vote
0 answers

R - transform (period) long-format timeseries to wide format hourly timeseries

I would like to transform the following data frame into a wide format hourly timeseries with zero as padding if there is no value. Essentially I want to transform a dataframe with start/endperiod into a hourly timeseries: …
1
vote
1 answer

Find close words in many articles in R

I have a tibble table (mydf) (100 rows by 5 columns). Articles are made up of many paragraphs. ID<-c(1,2) Date<-c("31/01/2018","15/02/2018") article1<-c("This is the first article. It is not long. It is not short. It comprises of many words and…
Beginner
  • 262
  • 1
  • 4
  • 12
1
vote
0 answers

fuzzy matching in DNA seqs

For the purposes of the reprex I've generated a tibble called random_DNA_tbl that is a random selection of 10 DNA sequences (of 100 bases). I've got a separate tibble called subseq_tbl, with 3 shorter sequences that match 100% to 3 of the sequences…
biomiha
  • 1,358
  • 2
  • 12
  • 25
1
vote
1 answer

Join rows in a data frame which have similar (but not equal) values

I have a df like: SampleID Chr Start End Strand Value 1: rep1 1 11001 12000 - 10 2: rep1 1 15000 20100 - 5 3: rep2 1 11070 12050 - 1 4: rep3 1 14950 20090 + 20 ... And I want to join…
Tato14
  • 425
  • 1
  • 4
  • 9
1
vote
1 answer

Error in rsqlite_send_query(conn@ptr, statement) : duplicate column name: Ret

I have a bunch of sql queries that worked fine but now, for some reason, do not work any more. The data has not changed. The code has not changed. I keep getting this error message: Error in rsqlite_send_query(conn@ptr, statement) : duplicate…
FG74
  • 61
  • 1
  • 6
0
votes
0 answers

Assign Id to fuzzy match name in new table - R

I have two tables. Table one has an id column and a full_name column. Table two has only a full name column but the names are near-matches and not full matches. I would like to apply the id column to the second table so that the ids apply to the…
0
votes
1 answer

Is there an R function that joins a key that is contained within another key

I am trying to join two tables based on a code created within each table that identifies a prescribed drug. The problem is that the drug code sometimes has additional numbers at the end in one table. See ex: ncdnum table 1: 4988472401 ncdnum table…
0
votes
1 answer

reverse table order in R fuzzy anti join match_fun

I am trying to run this code : main_df %>% fuzzy_anti_join(secondary_df, match_fun = list(`==`, `%within%`), by = c("ID","Date" = "Date_Interval")) the issue is that it returns the following error : Error in dplyr::group_by():…
marcelklib
  • 91
  • 5
0
votes
0 answers

Left join data frames by group and interval

I need to interval_left_join two dataframes by groups (the grouping variable is File), but using this code I get this error: library(BiocManager) library(fuzzyjoin) df1 %>% group_by(File) %>% interval_left_join(., …
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0
votes
2 answers

R join two data.table with with exact on one column and fuzzy on second

I am working with two data.tables, predicted yields over age based on a variety of stand condition field measurements of yields at a particular field location, with a measured age I would like to find the yield curve that best predicts the…
David
  • 759
  • 2
  • 9
  • 19
0
votes
1 answer

Table joins with conditional "fuzzy" string matching in R

I'm attempting to join two tables, one is a smaller table with a column of names of common food items (e.g. "Corn", "Peppers", "Squash"...etc...), and the other is a larger table with specific food names (e.g. "Sweet Corn", "Red Corn", "Baby Corn",…