Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
6
votes
1 answer

Excel MATCH range without specific CELL

after a deep search on the internet i gave up. My "simple" question would be: How can I add two ranges in a formula, preferably in MATCH? I want to search a range like A1:A7 + A9:A20 and thus not include A8 in my range. Is this possible? Please help…
NayNay
  • 95
  • 1
  • 1
  • 6
6
votes
1 answer

How to limit fuzzy join only returning one match

I am trying to create a program in R to replace city names or airport names with the three digit airport code. I want to do fuzzy matching to allow more flexibility since the data with the city/airport names I am trying to replace is coming in from…
sarahbarnes
  • 103
  • 2
  • 7
6
votes
1 answer

Python Fuzzy matching strings in list performance

I'm checking if there are similar results (fuzzy match) in 4 same dataframe columns, and I have the following code, as an example. When I apply it to the real 40.000 rows x 4 columns dataset, keeps running in eternum. The issue is that the code is…
ecp
  • 319
  • 1
  • 6
  • 18
6
votes
3 answers

Aho-Corasick and Proper Substrings

I'm trying to understand the aho-corasick string match algorithm. Suppose that our patterns are abcd and bc. We end up a tree like this [] /\ [a]..[b] / : | [b].: [c] | : [c]..... | [d] The dotted line…
Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
6
votes
2 answers

Speeding up a "closest" string match algorithm

I am currently processing a very large database of locations and trying to match them with their real world coordinates. To achieve this, I have downloaded the geoname dataset which contains a lot of entries. It gives possible names and lat/long…
LBes
  • 3,366
  • 1
  • 32
  • 66
6
votes
1 answer

Python group similar records (strings) in dataset

I have an input table like this: In [182]: data_set Out[182]: name ID 0 stackoverflow 123 1 stikoverflow 322 2 stack, overflow 411 3 internet.com 531 4 internet 112 …
Dio
  • 97
  • 1
  • 8
6
votes
4 answers

Matching strings in a column of a data frame with the strings in a column of another data frame using R or Python

I am trying to match strings in a column of a data frame with the strings in a column of another data frame and map the corresponding values. The number of rows are different for both data frames df1 = data.frame(name = c("(CKMB)Creatinine Kinase…
ajax
  • 131
  • 1
  • 11
6
votes
1 answer

Check if one string includes a substring with Levenshtein distance of 1 from other string

My problem is that we want our users to enter the code like this: 639195-EM-66-XA-53-WX somewhere in the input, so the result may look like this: The code is 639195-EM-66-XA-53-WX, let me in. We still want to match the string if they make a small…
tkowal
  • 9,129
  • 1
  • 27
  • 51
6
votes
4 answers

Python - Iterate through a list of strings and group partial matching strings

So I have a list of strings as below: list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"] How do I iterate through the list and group partially matching strings without given keywords. The result…
Tuan Dinh
  • 71
  • 1
  • 1
  • 3
6
votes
2 answers

Partial cell(or string) match in excel macro

I am new to VBA and I would like to do a partial string (or cell) match between two sheets. An example of Name1 would be "IT executive Sally Lim" An example of Name2 would be "Sally Lim" Name1 = Sheets("Work").Cells(RowName1, ColName1) Name2 =…
stupidgal
  • 171
  • 1
  • 2
  • 11
6
votes
2 answers

How to use regex for jasmine matchers

I need to verify the text label but it contains dynamic part, so I try to use regex but it doesn't work. expect(aboutPage.userInterfaceText.getText()).toMatch('/- User Interface: v \d+\.\d+\.\d+/'); I always get next error: - Expected '- User…
iPogosov
  • 367
  • 1
  • 6
  • 15
6
votes
2 answers

Would there be any advantage in comparing pattern and text characters right-to-left instead of left-to-right?

This is the exercise in "Introduction to The Design and Analysis of Algorithms". It's a string matching issue. Say I have string ABCD, and have a pattern XY. And want to see if the string contains the pattern. We just assume to use brute-force here,…
Tianzhou
  • 978
  • 2
  • 9
  • 15
6
votes
6 answers

JavaScript equivalent to C strncmp (compare string for length)

Is there an equivalent in JavaScript to the C function strncmp? strncmp takes two string arguments and an integer length argument. It would compare the two strings for up to length chars and determine if they were equal as far as length went. …
Daniel Bingham
  • 12,414
  • 18
  • 67
  • 93
6
votes
3 answers

String parsing using Python?

Given a string such as 'helloyellowellow', parse all the valid strings from the given string. (Eg: [[hell,hello,yellow],[low, low]........] I am looking for the most optimized way to write the code. Here is mine but I am not sure if this is the best…
6
votes
1 answer

How to search for a part of a dictionary key?

Could someone please tell me, how I can search for only a part of a key in a dictionary (in VB.NET)? I use the following sample code: Dim PriceList As New Dictionary(Of String, Double)(System.StringComparer.OrdinalIgnoreCase) …
PeterCo
  • 910
  • 2
  • 20
  • 36