Questions tagged [fuzzy-search]

A search mechanism where the objective is to find all approximate, relevant or possibly relevant results for the search-key rather than finding an exact match.

Fuzzy search is a search mechanism based on , where the objective is to find all approximate, relevant or possibly relevant results for keywords rather than finding an exact match. This allows for matches even where the keywords are misspelled or only hint at a concept.


Related tags

954 questions
33
votes
1 answer

Efficient string matching in Apache Spark

Using an OCR tool I extracted texts from screenshots (about 1-5 sentences each). However, when manually verifying the extracted text, I noticed several errors that occur from time to time. Given the text "Hello there ! I really like Spark ❤️!", I…
mrtnsd
  • 347
  • 1
  • 4
  • 3
31
votes
2 answers

Fuzzy Text Matching C#

I'm writing a desktop UI (.Net WinForms) to assist a photographer clean up his image meta data. There is a list of 66k+ phrases. Can anyone suggest a good open source/free .NET component I can use that employs some sort of algorithm to identify…
Myles McDonnell
  • 12,943
  • 17
  • 66
  • 116
25
votes
11 answers

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database. For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS"…
Ash
25
votes
6 answers

Algorithms for "fuzzy matching" strings

By fuzzy matching I don't mean similar strings by Levenshtein distance or something similar, but the way it's used in TextMate/Ido/Icicles: given a list of strings, find those which include all characters in the search string, but possibly with…
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
25
votes
7 answers

Similarity function in Postgres with pg_trgm

I'm trying to use the similarity function in Postgres to do some fuzzy text matching, however whenever I try to use it I get the error: function similarity(character varying, unknown) does not exist If I add explicit casts to text I get the…
Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113
23
votes
1 answer

ElasticSearch's Fuzzy Query

I am brand new to ElasticSearch, and am currently exploring its features. One of them I am interested in is the Fuzzy Query, which I am testing and having troubles to use. It is probably a dummy question so I guess someone who already used this…
A_dit_rien
  • 287
  • 1
  • 2
  • 7
22
votes
5 answers

Fuzzy text (sentences/titles) matching in C#

Hey, I'm using Levenshteins algorithm to get distance between source and target string. also I have method which returns value from 0 to 1: /// /// Gets the similarity between two strings. /// All relation scores are in the [0, 1] range,…
Lukas Šalkauskas
  • 14,191
  • 20
  • 61
  • 77
22
votes
7 answers

How to find best fuzzy match for a string in a large string database

I have a database of strings (arbitrary length) which holds more than one million items (potentially more). I need to compare a user-provided string against the whole database and retrieve an identical string if it exists or otherwise return the…
guillermooo
  • 7,915
  • 15
  • 55
  • 58
22
votes
3 answers

how to do fuzzy search in big data

I'm new to that area and I wondering mostly what the state-of-the-art is and where I can read about it. Let's assume that I just have a key/value store and I have some distance(key1,key2) defined somehow (not sure if it must be a metric, i.e. if the…
Albert
  • 65,406
  • 61
  • 242
  • 386
21
votes
2 answers

ElasticSearch - cross_fields multi match with fuzzy search

I have documents that represent users. They have fields name and surname. Let's say I have two users indexed - Michael Jackson and Michael Starr. I want these sample searches to work: Michael => { Michael Jackson, Michael Starr } Jack Mich => {…
Michal Artazov
  • 4,368
  • 8
  • 25
  • 38
21
votes
4 answers

Merging two Data Frames using Fuzzy/Approximate String Matching in R

DESCRIPTION I have two datasets with information that I need to merge. The only common fields that I have are strings that do not perfectly match and a numerical field that can be substantially different The only way to explain the problem is to…
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
20
votes
1 answer

ElasticSearch multi_match query over multiple fields with Fuzziness

How can I add fuzziness to a multi_match query? So if someone is to search for 'basball' it would still find 'baseball' articles. Currently my query looks like this: POST /newspaper/articles/_search { "query": { "function_score": { …
Funtriaco Prado
  • 319
  • 1
  • 3
  • 11
18
votes
1 answer

elasticsearch fuzzy matching max_expansions & min_similarity

I'm using fuzzy matching in my project mainly to find misspellings and different spellings of the same names. I need to exactly understand how the fuzzy matching of elastic search works and how it uses the 2 parameters mentioned in the title. As I…
18
votes
5 answers

How can I do fuzzy substring matching in Ruby?

I found lots of links about fuzzy matching, comparing one string to another and seeing which gets the highest similarity score. I have one very long string, which is a document, and a substring. The substring came from the original document, but…
Stian Håklev
  • 1,240
  • 2
  • 14
  • 26
18
votes
3 answers

Fuzzy string matching in Python

I have 2 lists of over a million names with slightly different naming conventions. The goal here it to match those records that are similar, with the logic of 95% confidence. I am made aware there are libraries which I can leverage on, such as the…
BernardL
  • 5,162
  • 7
  • 28
  • 47
1
2
3
63 64