Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
9
votes
7 answers

PostgreSQL and word games

In a word game similar to Ruzzle or Letterpress, where users have to construct words out of a given set of letters: I keep my dictionary in a simple SQL table: create table good_words ( word varchar(16) primary key ); Since the game…
Alexander Farber
  • 21,519
  • 75
  • 241
  • 416
8
votes
3 answers

How Can I Implement A Standard Set of Hyperlink Detection Rules in Delphi

I currently do automatic detection of hyperlinks within text in my program. I made it very simple and only look for http:// or www. However, a user suggested to me that I extend it to other forms, e.g.: https:// or .com Then I realized it might not…
lkessler
  • 19,819
  • 36
  • 132
  • 203
8
votes
3 answers

Longest Common Prefix Array

Following is the Suffix array and LCP array information for string MISSISSIPPI. I know that LCP gives information about the lenght of the longest common prefix between str[i - 1] and str[i]. How Do I get longest common prefix length between any two…
Avinash
  • 12,851
  • 32
  • 116
  • 186
8
votes
4 answers

How to verify that the password contains X uppercase letters and Y numbers?

How do I verify in C# that the password contains at least X uppercase letters and at least Y numbers, and the entire string is longer than Z? Thanks.
SexyMF
  • 10,657
  • 33
  • 102
  • 206
8
votes
2 answers

Efficient substring matching in perl

I am looking for an efficient solution to do find the longest possible substring in a string tolerating n mismatches in the main string Eg: Main…
Abhi
  • 6,075
  • 10
  • 41
  • 55
8
votes
3 answers

String similarity in PHP: levenshtein like function for long strings

The function levenshtein in PHP works on strings with maximum length 255. What are good alternatives to compute a similarity score of sentences in PHP. Basically I have a database of sentences, and I want to find approximate…
anon
  • 81
  • 1
  • 1
  • 4
8
votes
5 answers

Most effective way to lookup a substring C of string B in string A in LINQ

Having 2 strings like: string a = "ATTAGACCTGCCGGAA"; string b = "GCCGGAATAC"; I would like to just delete the part that is common in both strings and then the rest concatenate it. I have to tell that what I need to delete only left matched part so…
edgarmtze
  • 24,683
  • 80
  • 235
  • 386
8
votes
2 answers

How to capture a string between parentheses?

str = "fa, (captured)[asd] asf, 31" for word in str:gmatch("\(%a+\)") do print(word) end Hi! I want to capture a word between parentheses. My Code should print "captured" string. lua: /home/casey/Desktop/test.lua:3: invalid escape sequence…
Jin Su Lee
  • 83
  • 1
  • 4
8
votes
3 answers

Dart: RegExp by example

I'm trying to get my Dart web app to: (1) determine if a particular string matches a given regex, and (2) if it does, extract a group/segment out of the string. Specifically, I want to make sure that a given string is of the following…
IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756
8
votes
2 answers

What is the best way to match substring from a big string to a huge list of keywords

Imagine you have millions of records containing text with average 2000 words (each), and also you have an other list with about 100000 items. e.g: In the keywords list you a have item like "president Obama" and in one of the text records you have…
Reza M.A
  • 1,197
  • 1
  • 16
  • 33
8
votes
6 answers

Iterate over lines including blank lines

Given a multiline string with some blank lines, how can I iterate over lines in Lua including the blank lines? local s = "foo\nbar\n\njim" for line in magiclines(s) do print( line=="" and "(blank)" or line) end --> foo --> bar --> (blank) -->…
Phrogz
  • 296,393
  • 112
  • 651
  • 745
8
votes
6 answers

Fastest way to search for longest prefix in ORACLE

I have a list of phone number prefixes defined for large number of zones (in query defined by gvcode and cgi). I need to efficiently find a longest prefix that matches given number PHONE_NR. I use inverted LIKE clause on field digits (which contains…
Tomislav Muic
  • 543
  • 1
  • 8
  • 24
8
votes
7 answers

How to find the longest substring containing two unique repeating characters

The task is to find the longest substring in a given string that is composed of any two unique repeating characters Ex. in an input string "aabadefghaabbaagad", the longest such string is "aabbaa" I came up with the following solution but wanted to…
40mikemike
  • 89
  • 1
  • 1
  • 2
8
votes
2 answers

Create a unique ID by fuzzy matching of names (via agrep using R)

Using R, I am trying match on people's names in a dataset structured by year and city. Due to some spelling mistakes, exact matching is not possible, so I am trying to use agrep() to fuzzy match names. A sample chunk of the dataset is structured as…
thomasB
  • 303
  • 3
  • 11
8
votes
2 answers

SpringMongo Case insensitive search regex

I am trying a case insensitive search in Mongo. Basically I want case insensitive string match I am using regex. Here is my code Query query = new Query( Criteria.where(propName).regex(value.toString(), "i")); But the above dosent match my whole…
Droidme
  • 1,223
  • 6
  • 25
  • 45