Questions tagged [fuzzyjoin]

An R package for joining tables together on inexact matching.

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance, regular expression, or custom matching functions. Uses similar syntax as dplyr's joins.

161 questions
0
votes
1 answer

get the records before and after the nearest merge by 30 minutes in python

I have two data frames in csv files. First data described traffic incidents (df1) and second data has the traffic record data for each 15 minutes(df2). I want to merge between them based on the closest time. I used python pandas_merge_asof and I got…
0
votes
1 answer

make a fuzzyjoin and keep only exact match when there is one, keep all options otherwise

I have two dataframes which i am trying to join based on a country name field and what i would like to achieve is the following: when a perfect match is found i would like to keep only that row, otherwise i would like to show all…
Romain
  • 171
  • 11
0
votes
1 answer

Is there a way to merge by matching a column of words to a column of sentences in R

For instance: a<-c("This sentence has San-Francisco","This one has london","This one has newYork") b<-c(10,20,30) data1<-as.data.frame(cbind(a,b)) c<-c("San Francisco","London", "New York") d<-c(100,2050,100) data2<-as.data.frame(cbind(c,d)) So…
0
votes
1 answer

keeping the best string matched by fuzzy matching in R

I have two dataframes in R. one a dataframe of the phrases I want to match along with their synonyms in another column (df.word), and the other a data frame of the strings I want to match along with codes (df.string). The strings are complicated but…
ayeh
  • 48
  • 10
0
votes
1 answer

stringdist_semi_join only shows columns from dataframe1

I have two dataframes: df1 <- data.frame(City=c("Munchen_Paris","Munchen_Paris","Barcelona_Milan", "Londen_Dublin","Madrid_Malaga"), value1=c(11,21,33,2,53)) df2 <-…
user2165379
  • 445
  • 4
  • 20
0
votes
1 answer

How to fuzzy join 2 dataframes on 2 variables with differing "fuzzy logic"?

# example a <- data.frame(name=c("A","B","C"), KW=c(201902,201904,201905),price=c(1.99,3.02,5.00)) b <- data.frame(KW=c(201903,201904,201904),price=c(1.98,3.00,5.00),name=c("a","b","c")) I want to match a and b with fuzzy logic, using the variables…
jestor
  • 67
  • 5
0
votes
0 answers

fuzzy join with partial string match

I have a dataframe with two columns which can contain literally any character of various formats and i would like to match them. library(stringr) library(fuzzyjoin) x <- data.frame(idX=1:3, string=c("silver", "30BEDJE202AA", "30BEDJE2027")) y <-…
Romain
  • 171
  • 11
0
votes
2 answers

How to join dataframe on multiple columns and a fuzzy match on one?

I am trying to join decoded VIN data from NHTSA with vehicle data from fueleconomy.gov using year, make, and model. Below is an example of the data I am trying to join: # This is the first dataframe make <- c("PORSCHE", "TESLA",…
OpnSrcFan
  • 113
  • 6
0
votes
1 answer

Fuzzy join - match one sided

I'm trying to gather weight values from one table (myChickWts) that were collected in the week prior to each blood sample recorded in another table (chickblood). I want to get a list of blood dates and the associated weights from the week leading…
datakritter
  • 590
  • 5
  • 19
0
votes
3 answers

R fill new column based on interval from another dataset (lookup)

Lets say I have this dataset: df1 = data.frame(groupID = c(rep("a", 6), rep("b", 6), rep("c", 6)), testid = c(111, 222, 333, 444, 555, 666, 777, 888, 999, 1010, 1111, 1212, 1313, 1414, 1515, 1616, 1717, 1818)) df1 groupID…
user63230
  • 4,095
  • 21
  • 43
0
votes
1 answer

Create breaks in one data.frame by time intervals in another: fuzzy join

I record CO2 in df2 and have a list of experiment start and end times in d: data.frame df2 that contains continuous CO2 measurements over time. df2<-data.frame(CO2.ppm.=sample(300:500,72,replace=TRUE),Dev.Date.Time=seq( …
HCAI
  • 2,213
  • 8
  • 33
  • 65
0
votes
1 answer

Remove duplicate entries after fuzzy matching between tables

I am trying to find data entry errors in the names and locations of my dataset by fuzzy matching. I am have a unique key from the original data, siterow_id, and have made a new key, pi_key, where I already identified some hard matches. (no fuzzy…
dhbrand
  • 162
  • 3
  • 16
0
votes
0 answers

i want to match a list description to keywords and which keyword the description matches to

I have a data of articles with description and a list of keywords. I want to match a particular keyword to description and form a column to which keyword it matches. list of articles: id description 1 In order to investigate the role of calcium…
Vasudha Jain
  • 93
  • 2
  • 10
0
votes
0 answers

How can I fuzzy match variables on names with same dates?

I want to match two datasets. The names have different writings, so I would fuzzy match them, but the names have multiple entries of different months. How can I set up the matching so it matches the entries with the same months? E.g Data set 1 : Joe…
0
votes
1 answer

R: fuzzy merge two data frame

I have 2 data frames. First, abc <- data.frame(bin1 = c("0-25K", "25K-50K", "50K+"), group1 = c(1, 1, 2), bin2 = c("0-25", "25-50", "50+"), group2 = c(1, 2, 2)) pqr <- data.frame(bin1 = c("1_0-25K", "2_25K-50K", "3_50K+"),bin2 = c("0,25", "25,50",…
Bruce Wayne
  • 471
  • 5
  • 18
1 2 3
10
11