Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
0
votes
1 answer

TSQL Fuzzy address matching grouping, 2019 Edition

I have this situation where people asked to group on bad addresses. And I need to work on the tools/env I have, I don't have choice for Google API or 3rd party Data Science tools. I also did my HW, see posts several years old, so still want to…
Mike S
  • 296
  • 2
  • 14
0
votes
0 answers

Karate - How to compare shuffled json with response?

I am trying to compare response with expected json stored already as a file. There are few elements in response where the order is different in response. Tried below code but still failing: def expected_response =…
Ruchi
  • 1
  • 1
0
votes
1 answer

How to fix incorrect fuzzy-matches with over 90 thresholds?

I have two datasets that I need to fuzzy-match over a column which contains organization names. I used fuzzywuzzy library in Python and set the threshold 50 (see the code below). The code successfully matched some names. When I eyeballed the…
0
votes
1 answer

Fuzzy search over millions of strings with custom distance function

I have a large pool of short strings and a custom distance function on them (let's say Damerau–Levenshtein distance). Q: What is the state-of-the-art solution for getting top N strings from the pool according to the custom distance? I am looking for…
0
votes
0 answers

Fuzzy string match comparing two large files using python

I have two large files, File1 and File2,each containing the names of companies. I am trying to find a fuzzy match of companies names ("companyname") from File2 to match to File1. Currently, I am not able to complete processing since it is timing…
mteavan
  • 33
  • 3
0
votes
1 answer

Match two datasets across multiple ‘dirty’ columns in R

I frequently need to match two datasets by multiple matching columns, for two reasons. First, each of these characteristics are ‘dirty’, meaning a single column does not consistently match even when it should (for a truly matching row). Second, the…
Kayle Sawyer
  • 549
  • 7
  • 22
0
votes
1 answer

Algorith for comparing two names to see if the are similar/same

I'm currently implemetning an automated workflow which has to compare a fixed name with another name and return if the name is a match or not. It should consider spelling/typo mistakes and implement a suitable algorithm like…
Bearzi
  • 538
  • 5
  • 18
0
votes
1 answer

Fuzzy compare and aggregate similar records within a single single column data-frame

The current requirement is to aggregate a single column and supply a count with the per row. There are a couple of issues that I am encountering that I need assistance with: Many lines are similar but not exact due to a parameter or other…
artofsql
  • 613
  • 4
  • 11
0
votes
0 answers

SSIS Fuzzy look up not loading tables to match

I am running into an issue with the FuzzyLookup function of SSIS. When I try to load in the data from either an Excel file or via two SQL tables I am unable to get the tables to load to complete the matching steps. Has anyone else run into this…
0
votes
1 answer

python - fuzzywuzzy error - object of type float has no len

I am trying to use the fuzzywuzzy library to get similarity score between strings in 2 datasets using the fuzz.ratio function. Although I am constantly getting the following error : File "title_matching.py", line 29, in match =…
iammrmehul
  • 730
  • 1
  • 14
  • 35
0
votes
2 answers

Check format similarity between two strings

I have a string format which is like: the word must be 15 letters long first 8 letters are date Example: '2009060712ab56c' Let's say I want to compare this with another string and give a percentage of format similarity like: result =…
s900n
  • 3,115
  • 5
  • 27
  • 35
0
votes
0 answers

pandas - fuzzywuzzy - speeding loop up when doing fuzzymatching?

I am basically trying to join 2 dataframes using approximate match. How I do this in general is listed below: have the list of strings to matched define a function using fuzzy's process.extract apply this function across all rows in the 1st…
addicted
  • 2,901
  • 3
  • 28
  • 49
0
votes
3 answers

Mysql: Concatenate Duplicate Data but ignore string in duplicates

Is there a way to find duplicate data while ignoring a given string? For example if I have a table of names, is there a way to concatenate rows that both have the name "Ann Smith" but ignore the string "Dr. ". For example rows containing "Ann…
imapotatoe123
  • 656
  • 1
  • 10
  • 21
0
votes
0 answers

Fuzzy match algorithm between full names from different manual inputs in TSQL?

I'm hoping to implement a fuzzy match algorithm in TSQL (without MDS) that compares full names. The names are coming from separate manual inputs with no controls over what's entered. One of the systems also tends to cut off the end of names as it…
0
votes
1 answer

SSIS Fuzzy Grouping Always return the same result with different similarity thrshold

Can anyone tell me why my similarity is always 1. My goal is AAB and AAC can be set as the same group for example. Thanks
Aiden
  • 129
  • 1
  • 1
  • 11