Questions tagged [fuzzyjoin]

An R package for joining tables together on inexact matching.

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance, regular expression, or custom matching functions. Uses similar syntax as dplyr's joins.

161 questions
0
votes
0 answers

Merge/join two dataframes based on shortest distance of longitudes and latitudes

I have one df1 that has 800 rows and the other df2 that has 9 million rows. Both have latitude and longitude and the df2 has some more columns that I need to add to df1 based on shortest distance as lat and lon do not mach exactly in both…
Heresh
  • 1
  • 2
0
votes
1 answer

Fuzzy Matching/Join Two Data Frames of University Names

I have a list of university names input with spelling errors and inconsistencies. I need to match them against an official list of university names to link my data together. I know fuzzy matching/join is my way to go, but I'm a bit lost on the…
0
votes
1 answer

Dplyr join on maximum matching value, if no exact match is possible

I'm trying to join two tables in dplyr. Sometimes it's possible to match exact on the column year, but in some cases the matching year is not available. In that case, i would like to join on the maximum year Left <- tibble(id = c(1,2,3), …
0
votes
2 answers

join two tables in R using Names from both tables

Hi guys I know there are few questions related to joining tables in R, I tried most of them but they didn't work, in my case, I have two tables first one (A) has two columns (Id and company_name), 70,000 rows, and the second one (B) has…
user5064322
0
votes
1 answer

Merge data frames by time interval in R

I have two Data Frames. One is an Eye Tracking data frame with subject, condition, timestamp, xposition, and yposition. It has over 400,000 rows. Here's a toy data set for an example: subid condition time xpos ypos 1 1 1 1.40 195 …
Spencer Castro
  • 1,345
  • 1
  • 9
  • 21
0
votes
0 answers

Memory Issues using rnoaa package

Working with rnoaa package to take add US station IDs to a df of weather events. Below is str() for the rain df. google drive link to csv file of subset 'data.frame': 4395 obs. of 63 variables: $ YEAR : int 2009 2009 2012 2013…
Francisco
  • 169
  • 1
  • 9
-1
votes
2 answers

How to merge based on a string in a column?

I would like to do exact joins for the columns state and name, but a fuzzy join for the "name" and "versus" columns: year <- c("2002", "2002", "1999", "1999", "1997", "2002") state <- c("TN", "TN", "AL", "AL", "CA", "TN") name <- c("George",…
hy9fesh
  • 589
  • 2
  • 15
-1
votes
2 answers

Filter rows based on presence of a string element from another column

I am trying to filter out relevant rowas based on the presence or existence of a string or part/element of a string in R. Following is the example: colA colb flag New York Metropolitan…
marine8115
  • 588
  • 3
  • 22
-1
votes
4 answers

Standardize the City Name in R

I am new in R and coding world, pardon if i perhaps mispelled some or more jargon here (cmiiw). I am facing a challenge to clean city name in a dataframe. Tried to use GetCloseMatches, strdist_inner_join (with fuzzywuzzy i believe) with dplyr style…
rgoei
  • 1
  • 3
-1
votes
2 answers

SQL Left Fuzzy Join with Levenshtein Distance

I have two data sets from two different systems being merged together within SQL, however, there is a slight difference within the naming conventions on the two systems. The change in convention is not consistent across the larger data sample but…
tg00222
  • 3
  • 3
-1
votes
3 answers

R - fuzzy join on nearest integer only

Suppose I've got this data set to start with, in this silly layout: originalDF <- data.frame( Index = 1:14, Field = c("Name", "Weight", "Age", "Name", "Weight", "Age", "Height", "Name", "Weight", "Age", "Height", …
1 2 3
10
11