Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
0
votes
2 answers

my for loop will only provide the last result and not all of them

I am trying to compare the similarity between two text using a score. This is my code: risk_list1_txt = [] scoreList = [] similarityDict = {} theScore = 0 for text1 in risk_list1: similarityDict['FileName'] = text1 theText1 = open(path1 +…
mehrblue
  • 35
  • 1
  • 4
0
votes
2 answers

Split speech audio at spoken word

I have an audio file of a long text with different sections all beginning with the spoken word "Chapter" (narrated by the same speaker). Is there a way to split the audio file in smaller files at these words? I am thinking of cutting out one of…
halloleo
  • 9,216
  • 13
  • 64
  • 122
0
votes
0 answers

Robust non-phonetic non-intensive fuzzy substring match

If you are writing code to fuzzily match two strings, e.g. "coca-cola" vs. "koca-cola", there are some standard ways of doing it, e.g. comparing the Levenshtein edit distance (http://en.wikipedia.org/wiki/Levenshtein_distance) computing…
0
votes
3 answers

TSQL Query for analyzing Text

I have a table that has ordernumber, cancelled date and reason. Reason field is varchar(255) field and it was written by many different sales rep and really hard to group by the reason category I need to generate a report to categorize cancelation…
THEn
  • 1,920
  • 3
  • 28
  • 35
0
votes
2 answers

Fuzzy Matching on Date-Type values

I don't have a real question but I'm more like seeking for creative input for a problem. I want to compare two (most likely unequal) Date values and calculate the ratio of their similarity. So for example if I'd compare 08.01.2013 and 10.01.2013 I…
Frank Wittich
  • 139
  • 3
  • 11
-1
votes
1 answer

SQL fuzzy match query

I have a uaserData table with users infromation. It has Id, firstname , lastname and many more. So in that table if I have 'like below' two persons with the firstname and lastname are the same they are most likely duplicates. (can be spelling…
-1
votes
1 answer

Partial matching with a pattern

Is there a way in python to perform partial matching between a word and a generic pattern (a regular expression)? The aim is to understand how far is a word from a given pattern, e.g. the distance of a word from the pattern of a license plate that…
n7h_m4d
  • 19
  • 3
-1
votes
1 answer

Merge dataframes by closest coordinates

Imagine we have 2 dataframes with coordinates ['X','Y']: df1 : X Y House № 2531 2016 175 2219 2196 11 2901 3426 201 6901 4431 46 7891 1126 …
-1
votes
1 answer

python - fuzzy matching, looping through a data set to find corresponding items in the reference set

I am trying to learn and implement fuzzy matching in python. I have two data sets which I load as data frames into pandas. Set 1 is the reference set. Set two is the set containing data to match with the reference names. I loop through the set_1…
Chris
  • 767
  • 1
  • 8
  • 23
-1
votes
1 answer

How do I use regex in R to create a new column of canonicalized company names?

I have a dataframe with a column of company names. I want to create a new column that is a fuzzy/canonicalized version of the name (perhaps using regex to strip suffixes like "corporation, "inc", and "llc" and prefixes like "the"). name <-…
jisoo shin
  • 540
  • 6
  • 15
-1
votes
1 answer

SSIS: Fuzzy Grouping only for specific rows

I'm using SQL Server Integration Services in Visual Studio 2012 and I'm trying to find similar addresses that are referenced by different customers using the Fuzzy Grouping component. Here's some sample data (SQL Fiddle): CREATE TABLE…
Onkel Toob
  • 2,152
  • 1
  • 17
  • 25
-1
votes
5 answers

Percentage of how similar strings are in Python?

I don't know how to do a program that gives a percentage of how similar two strings of the same length are. For example, for abcd and abce it should give 75%. The order matters, I don't want that it gives me that abcd and dcab have a 100%. I know…
user3103718
  • 63
  • 2
  • 6
-2
votes
1 answer

Fuzzy comparison of strings in lists of huge length (taking into account performance)

I have two lists: The first list I get from the database is the names of various companies (can be written in uppercase, lowercase or a combination) list_from_DB = ["Reebok", "MAZDA", "PATROL", "AsbEngland-bank", "Mazda INCC", "HIGHWAY lcc",…
Paul
  • 53
  • 3
  • 21
-2
votes
1 answer

why do i get a key error from output when i do a merge

hi please help me I am trying to fuzzy merge using pandas and fuzzywuzzy on two datasets using two columns from each, but I get a traceback at the line before the print function that says KeyError: ('name', 'lasntname'), I do not know if I am…
Lamo
  • 11
  • 3
-2
votes
1 answer

Business student totally new to Python wants a script for strings fuzzy matching

I am a business student who just began to learn Python. My professor asked me to do fuzzy matching between two files: US Patent information and Company information downloaded from stock exchange website. My task is to compare the company names that…
Hannah
  • 11
  • 2
1 2 3
24
25