Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
5
votes
0 answers

Install python ssdeep wrapper on Windows

I am running python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)] pip install ssdeep ... _ssdeep_cffi_8a9054b9x627c7d55.c ssdeep\__pycache__\_ssdeep_cffi_8a9054b9x627c7d55.c(209) : fatal error C1083: Cannot open…
arsenik
  • 987
  • 2
  • 8
  • 22
5
votes
3 answers

R: Using plyr to perform fuzzy string matching between matching subsets of two data sources

Say I have a list of counties with varying amounts of spelling errors or other issues that differentiate them from the 2010 FIPS dataset (code to create fips dataframe below), but the states in which the misspelled counties reside are entered…
mcjudd
  • 1,520
  • 2
  • 18
  • 33
5
votes
2 answers

Example of fuzzy logic in classification

I need to classify objects using fuzzy logic. Each object is characterized by 4 features - {size, shape, color, texture}. Each feature is fuzzified by linguistic terms and some membership function. The problem is I am unable to understand how to…
SKM
  • 959
  • 2
  • 19
  • 45
5
votes
3 answers

How to calculate matching score between two string in java?

I want classify two strings as similar or not similar. For example s1 = "Token is invalid. DeviceId = deviceId: "345" " s2 = "Token is invalid. DeviceId = deviceId: "123" " s3 = "Could not send Message." I am looking for a java library that can…
Sean Nguyen
  • 12,528
  • 22
  • 74
  • 113
5
votes
3 answers

How do I determine whether a number is within a percentage of another number

I'm writing iPhone code that fuzzily recognizes whether a swiped line is straight-ish. I get the bearing of the two end points and compare it to 0, 90, 180 and 270 degrees with a tolerance of 10 degrees plus or minus. Right now I do it with a bunch…
willc2
  • 38,991
  • 25
  • 88
  • 99
4
votes
2 answers

Fuzzy Comparison in Ruby/Rails

I was looking for some good options for fuzzy comparison in Rails. Essentially, I have a set of strings that I'd like to compare against some strings in my database and I'd like to get the closest one if applicable. In this particular case, I'm not…
Thariq Shihipar
  • 1,072
  • 1
  • 12
  • 27
4
votes
1 answer

Is there a python function to get "unique" strings in the sense of some similarity measure?

I have a set of strings (in my case it is a column of a pandas dataframe, but it would be ok to consider alternative data structures as list/arrays/...) and I would like to get all "unique" values from that set, where unique is not exact matching…
Luca Clissa
  • 810
  • 2
  • 7
  • 27
4
votes
2 answers

Fuzzy matching a string within a large body of text in Python (url)

I have a list of company names, and I have a list of url's mentioning company names. The end goal is to look into the url, and find out how many of the companies on the url are in my list. Example URL: http://www.dmx.com/about/our-clients Each URL…
Kyle
  • 63
  • 5
4
votes
3 answers

Fuzzy string search in Java, including word swaps

I am a Java beginner, trying to write a program that will match an input to a list of predefined strings. I have looked at Levenshtein distance, but I have come to problems such as this: If I have an input such as "fillet of beef" I want it to be…
abroekhof
  • 796
  • 1
  • 7
  • 20
4
votes
1 answer

Abbreviation Detection for Python

I am trying to measure the similarity of company names, however I am having difficulties while I'm trying to match the abbreviations for those names. For example: IBM The International Business Machines Corporation I have tried using fuzzywuzzy to…
4
votes
2 answers

Inner join exactly on one column and fuzzy on another

I have two dataframes I want to join. They share two fields: group_id and person_name. I want to join exactly on group_id and fuzzy on person_name. How can I do this? Constraints: It should be an inner join. So group_id exactly and person_name…
Hatshepsut
  • 5,962
  • 8
  • 44
  • 80
4
votes
0 answers

fuzzy string matching for common multi-character OCR errors in python

I'm trying to do some fuzzy matching on some OCR results, and I want to be able to factor in common OCR errors. In particular, I'm matching streets to a database of streets. I figured out how to down-weight common single-character substitution…
4
votes
2 answers

How to get an accurate JOIN using Fuzzy matching in Oracle

I'm trying to join a set of county names from one table with county names in another table. The issue here is that, the county names in both tables are not normalized. They are not same in count; also, they may not be appearing in similar pattern…
Dav KR
  • 51
  • 1
  • 4
4
votes
1 answer

RecordLinkage: how to pair only best matches and export a merged table?

I am trying to use the R package RecordLinkage to match items in the purchase orders list with entries in the master catalogue. Below is the R code and a reproducible example using two dummy datasets (DOrders and DCatalogue): DOrders <-…
Mihail
  • 761
  • 5
  • 22
4
votes
1 answer

How two check if two unstructured street adresses strings are the same?

I need to compare two unstructured addresses and be able to identify if they are the same (or similar enough). Scenario Address is supplied by the end user in plain text. There is nothing to help the user to write on a more identifiable manner (no…
Minduca
  • 1,121
  • 9
  • 19