Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
11
votes
4 answers

String Matching: Computing the longest prefix suffix array in kmp algorithm

KMP algorithm for string matching. Following is the code I found online for computing the longest prefix-suffix array: Defination: lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i]. Code: void…
Sahil Sareen
  • 1,813
  • 3
  • 25
  • 40
11
votes
2 answers

How to subset data with advance string matching

I have the following data frame from which I would like to extract rows based on matching strings. > GEMA_EO5 gene_symbol fold_EO p_value RefSeq_ID BH_p_value KNG1 3.433049 8.56e-28 …
Toke Duce Krogager
  • 207
  • 2
  • 3
  • 6
10
votes
2 answers

Wildcard string matching in Ruby

I'd like to write a utility function/module that'll provide simple wildcard/glob matching to strings. The reason I'm not using regular expressions is that the user will be the one who'll end up providing the patterns to match using some sort of…
sa125
  • 28,121
  • 38
  • 111
  • 153
10
votes
6 answers

Matching words from vectors of strings in R

I'm trying to clean up a database by matching a messy list of site names with an approved list. As an example, the preferred site name might be 'Cotswold Water Park Pit 28' but the site has been entered into the database as: 'Pit 28', '28', 'CWP Pit…
James
  • 1,164
  • 2
  • 15
  • 36
10
votes
5 answers

Finding matching portions of two strings in PHP

I'm looking for a simple way to find matching portions of two strings in PHP (specifically in the context of a URI) For example, consider the two…
ubermensch
  • 902
  • 1
  • 12
  • 21
10
votes
2 answers

Extract last word in a string after comma if there are multiple words else the first word

I have data where the words as follows location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) I would like to extract the country name from the data. The tricky part is if i extract just the last…
user3570187
  • 1,743
  • 3
  • 17
  • 34
10
votes
1 answer

Lua: String.match vs String.gmatch?

I have both the "5.1 Reference Manual" and the "Programming in Lua: 3rd Ed." in front of me. Reading these, as well as numerous searches on the web, still leave me a bit confused when it comes to using string.match and string.gmatch. I understand…
Pwrcdr87
  • 935
  • 3
  • 16
  • 36
10
votes
1 answer

Possible bug in VB.NET 'Like' operator?

Why is it that the following evaluates as True? Dim result = "b" Like "*a*b" Thanks. EDIT: To generalize this a bit, the following returns True: "String1" Like "*AnyText1*AnyText2*AnyText???******????*String1" VBA works correctly, returning…
mcu
  • 3,302
  • 8
  • 38
  • 64
9
votes
4 answers

preg_match for multiple words

I want to test a string to see it contains certain words. i.e: $string = "The rain in spain is certain as the dry on the plain is over and it is not clear"; preg_match('`\brain\b`',$string); But that method only matches one word. How do I check for…
Asim Zaidi
  • 27,016
  • 49
  • 132
  • 221
9
votes
2 answers

How to map the most "similar" strings from one list to another in python?

Given are two lists containing strings. One contains the name of organisations (mostly universitys) all around the world - not only written in english but always using latin alphabet. The other list contains mostly full addresses in which strings…
Aufwind
  • 25,310
  • 38
  • 109
  • 154
9
votes
2 answers

String Matching using fuzzywuzzy- is it using Levenshtein distance or the Ratcliff/Obershelp pattern-matching algorithm?

fuzzywuzzy is a very popular library for string matching. As per the documentation of the library, it is mentioned that it uses Levenshtein distance for computing the differences between sequences. But upon close inspection, I find that it actually…
prashanth
  • 4,197
  • 4
  • 25
  • 42
9
votes
2 answers

Matching strings in PowerShell

The gist of this question is: So it seems that if ($C -match $b.Name) considers a partial match of a string a match? Is there a better way to force a complete [match] of a string? I've got a directory that gets populated with a ton of .7z files. I…
Ochuse
  • 219
  • 1
  • 5
  • 12
9
votes
1 answer

How to search for a string in one column in other columns of a data frame

I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them…
dnoss
  • 111
  • 1
  • 2
9
votes
2 answers

First-Occurrence Parallel String Matching Algorithm

To be up front, this is homework. That being said, it's extremely open ended and we've had almost zero guidance as to how to even begin thinking about this problem (or parallel algorithms in general). I'd like pointers in the right direction and not…
Xorlev
  • 8,561
  • 3
  • 34
  • 36
9
votes
6 answers

C#: How to Delete the matching substring between 2 strings?

If I have two strings .. say string1="Hello Dear c'Lint" and string2="Dear" .. I want to Compare the strings first and delete the matching substring .. the result of the above string pairs is: "Hello  c'Lint" (i.e, two spaces between "Hello"…
Rookie Programmer Aravind
  • 11,952
  • 23
  • 81
  • 114