Questions tagged [fuzzy-comparison]

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly).

Fuzzy comparison is the colloquial name for Approximate String matching, the technique of finding strings that match a pattern approximately (rather than exactly). This problem is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.


Useful links


Related tags

361 questions
0
votes
1 answer

dictionary-based fuzzy matching

I want to match the entity occurrences in SeqString. For example: dict_data = ['johnson', 'apple platform'] SeqString = 'Johnson buys a new phone which is based on Apppple Platform. Johnson very likes the Apple Platform.' Expected results: Match…
futurelj
  • 273
  • 5
  • 14
0
votes
1 answer

match similar strings with r

I need to match names in two different datasets. These firm names can be partially different and are not unique in both datasets: they may be repeated many times. Although some of these names may coincide in the two dataset, I want to compare all…
Macrina
  • 25
  • 8
0
votes
1 answer

fuzzy string compare (check for shorthand matching) C#

I have two lists of string, and I want to extract from each list the index if the string at current index is in the second list(and vice versa), the string cant match exactly or can be a shorthand of another list, for example, consider this two…
styx
  • 1,852
  • 1
  • 11
  • 22
0
votes
1 answer

How can I use fuzzy grouping in a mySQL query

Is this even possible? I can't seem to find any proper guide to set it up. Everything I find is given with instruction on SISS with which I am not familiar at all. Other options I find are involving SOUNDEX() which is not relevant for what I want to…
CodeAt30
  • 874
  • 6
  • 17
0
votes
1 answer

fuzzy join with sqldf

I was expecting this code to return a data.frame with name = helicase in row 1. How can I make this type of comparison with sqldf? require(data.table) df <- fread('EC name 2.1.1.233 helicase 4.1.3.3 phosphatase …
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
0
votes
0 answers

Microsoft Excel - Fuzzy Lookup Plugin

For anyone who has used the Fuzzy Lookup plugin - In the case of multi-column fuzzy match, does it make a difference if I choose each column seperately in the "Match Columns" or together? What is the difference?
tempidope
  • 823
  • 1
  • 12
  • 29
0
votes
2 answers

scala merge tuples using fuzzy string match

Input: val input = List((a, 10 Inches), (a, 10.00 inches), (a, 15 in), (b, 2 cm), (b, 2.00 CM)) I like to have an output val output = List((a, 10 Inches, 0.66), (b, 2 cm, 1)) I also have a utility function that returns true for fuzzy matching ("10…
yalkris
  • 2,596
  • 5
  • 31
  • 51
0
votes
1 answer

agrep output approximate macthing

Having agrep('timothy', c('timo','tim','timoth', 'timothys'), max.distance = 0.01, value=TRUE) I want to output the original string and all possible results together in a data frame as below. Original Replace1 Replace2 timothy timoth …
Rtab
  • 123
  • 10
0
votes
1 answer

choosing the largest weights when connecting with fuzzy logic in R

I need merge two datasets df1 df1=structure(list(id = structure(c(1L, 4L, 5L, 6L, 2L, 3L), .Label = c("195/75 R16C-Tire CORDIANT Business CA", "215/75 R17,5-Tires KAMA NR-201 driving axle", "235/70 R16-Tire KAMA-221", "275/70 R22,5-Tire TYREX ALL…
psysky
  • 3,037
  • 5
  • 28
  • 64
0
votes
1 answer

Fuzzy graph comparison

Are there any known algorithms or solutions to compare graphs (functions)? Let say we have two graphs they have some same areas but could have some mismatches in points number or points values. For ex, on the picture we see almost identical graphs…
artberry
  • 791
  • 2
  • 7
  • 18
0
votes
1 answer

Processing large Pandas Dataframes (fuzzy matching)

I would like to do fuzzy matching where I match strings from a column of a large dataframe (130.000 rows) to a list (400 rows). The code I wrote was tested on a small sample (matching 3000 rows to 400 rows) and works fine. It is too large to copy…
Michiel V.
  • 121
  • 1
  • 12
0
votes
1 answer

PostgreSQL: Address matching using fuzzymatch from two tables

What I want to do; I have two tables with two address columns , both stored as text I want to create a view returning the matching rows. What I've tried; I've created and index on both columns and tables as below; CREATE INDEX idx_table1_fulladdress…
mapping dom
  • 1,737
  • 4
  • 27
  • 50
0
votes
2 answers

Find similar permutation of a word in another column

I want to look for permutations that match with a given word, and arrange my data based on column position. IE - I created a CSV with data I scrapped from several websites.Say it looks something like this: Name1 OtherVars Name2 More…
oba2311
  • 373
  • 4
  • 12
0
votes
0 answers

Parsing addresses from varchar in PostgreSQL

Could you please advise me what is the best way of parsing address from string? I have available a table of addresses exported in the form of OSM Points (city, street, house number, country code, post code, geometry column, ...), and text parameter…
Denis Stephanov
  • 4,563
  • 24
  • 78
  • 174
0
votes
0 answers

fsWeights in RLBigDataLinkage in R

we are using RLBiggDataLinkage in R for linking two records 1. Master data (~1.6 million records) 2. target (~100k records) Columns are first name, last name, address, zip, unique id1, unique id 2 unique ids are not available for all records in…