Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
5
votes
1 answer

Algorithm for matching data

I have a project where I am testing a device that is very sensitive to noise (electromagnetic, radio, etc...). The device generates 5-6 bytes per second of binary data (looks like gibberish to an untrained eye) based on a give input (audio).…
AngryHacker
  • 59,598
  • 102
  • 325
  • 594
5
votes
2 answers

difflib on Ruby

Is there a library similar to Python's difflib on Ruby? Particularly, I need one that has a method similar to difflib.get_close_matches. Any recommendations?
fjsj
  • 10,995
  • 11
  • 41
  • 57
5
votes
4 answers

How to match a string in a sentence

I want to check whether a particular string is present in a sentence. I am using simple code for this purpose subStr = 'joker' Sent = 'Hello World I am Joker' if subStr.lower() in Sent.lower(): print('found') This is an easy straightforward…
Olivia Brown
  • 594
  • 4
  • 15
  • 28
5
votes
2 answers

search keywords efficiently when keywords are multi words

I needs to match a really large list of keywords (>1000000) in a string efficiently using python. I found some really good libraries which try to do this fast: 1) FlashText (https://github.com/vi3k6i5/flashtext) 2) Aho-Corasick Algorithm…
5
votes
4 answers

Detect that 2 string are same but in different order

My goal is to detect that 2 string are same but in different order. Example "hello world my name is foobar" is the same as "my name is foobar world hello" What i already tried is splitting both string into list and compare it within loop. text =…
nfl-x
  • 487
  • 1
  • 6
  • 13
5
votes
5 answers

Closest match for Full Text Search

I am trying to implement an internal search for my website that can point users in the right direction in case the mistype a word, something like the did you mean : in google search. Does anybody have an idea how such a search can be done? How can…
5
votes
2 answers

Rabin-Karp String Matching is not matching

I've been working on a Rabin-Karp string matching function in C++ and I'm not getting any results out of it. I have a feeling that I'm not computing some of the values correctly, but I don't know which one(s). Prototype void rabinKarp(string…
Madison S
  • 225
  • 1
  • 10
5
votes
3 answers

Searching one Python dataframe / dictionary for fuzzy matches in another dataframe

I have the following pandas dataframe with 50,000 unique rows and 20 columns (included is a snippet of the relevant columns): df1: PRODUCT_ID PRODUCT_DESCRIPTION 0 165985858958 "Fish Burger with Lettuce" 1 …
gincard
  • 1,814
  • 3
  • 16
  • 24
5
votes
2 answers

Why use regex finditer() rather than findall()

What is the advantage of using finditer() if findall() is good enough? findall() returns all of the matches while finditer() returns match object which can't be processed as directly as a static list. For example: import re CARRIS_REGEX =…
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
5
votes
2 answers

Can someone explain to me the Rabin-Karp algorithm's complexity?

I'm trying to understand why the worst case running time of the Rabin-Karp algorithm is O(nm) and the average case is O(n+m). Can someone help me with that?
5
votes
1 answer

Using Python's jellyfish module to get best match (partial string matching)

I am trying to create a dictionary of some kind to append my results and get the best match using the jaro distance function. This is part of my attempt to match 2 lists and get the best matched name in both. Example: import…
BernardL
  • 5,162
  • 7
  • 28
  • 47
5
votes
5 answers

Fuzzy logic matching

So, I'm looking at implementing Fuzzy logic matching in my company and having trouble getting good results. For starters, I'm trying to match up Company names with those on a list supplied by other companies. My first attempt was to use soundex,…
yoelbenyossef
  • 393
  • 1
  • 7
  • 26
5
votes
4 answers

Implementing Knuth-Morris-Pratt (KMP) algorithm for string matching with Python

I am following Cormen Leiserson Rivest Stein (clrs) book and came across "kmp algorithm" for string matching. I implemented it using Python (as-is). However, it doesn't seem to work for some reason. where is my fault? The code is given below: def…
serious_luffy
  • 419
  • 1
  • 6
  • 17
5
votes
1 answer

How to remove hidden characters from text string in PHP?

I am having difficulty to match two text strings. One contains some hidden characters from a text string. I have a text string: "PR & Communications" stored on an SQL database. When pulled from there, into $database_version,…
5
votes
2 answers

PHP array search within array

$keywords = array('red', 'blue', 'yellow', 'green', 'orange', 'white'); $strings = array( 'She had a pink dress', 'I have a white chocolate', 'I have a green balloon', 'I have a chocolate shirt', 'He had a new yellow book', 'We have many blue…
Bengali
  • 198
  • 2
  • 13