Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
82
votes
25 answers

javascript regular expression to check for IP addresses

I have several ip addresses like: 115.42.150.37 115.42.150.38 115.42.150.50 What type of regular expression should I write if I want to search for the all the 3 ip addresses? Eg, if I do 115.42.150.* (I will be able to search for all 3 ip…
KennC.
  • 3,315
  • 6
  • 20
  • 18
82
votes
17 answers

Regular Expression Arabic characters and numbers only

I want Regular Expression to accept only Arabic characters, Spaces and Numbers. Numbers are not required to be in Arabic. I found the following expression: ^[\u0621-\u064A]+$ which accepts only only Arabic characters while I need Arabic characters,…
moujtahed
  • 841
  • 1
  • 7
  • 7
67
votes
4 answers

How to do whole-word search similar to "grep -w" in Vim

How do I do a whole-word search like grep -w in Vim, which returns only lines where the sought-for string is a whole word and not part of a larger word? grep -w : Select only those lines containing matches that form whole words. Can this be done in…
user1420463
65
votes
8 answers

Regex allow a string to only contain numbers 0 - 9 and limit length to 45

I am trying to create a regex to have a string only contain 0-9 as the characters and it must be at least 1 char in length and no more than 45. so example would be 00303039 would be a match, and 039330a29 would not. So far this is what I have but I…
NewToRegEx
  • 651
  • 1
  • 5
  • 3
61
votes
2 answers

How to select R data.table rows based on substring match (a la SQL like)

I have a data.table with a character column, and want to select only those rows that contain a substring in it. Equivalent to SQL WHERE x LIKE '%substring%' E.g. > Months = data.table(Name = month.name, Number = 1:12) > Months["mb" %in% Name] Empty…
Corvus
  • 7,548
  • 9
  • 42
  • 68
53
votes
3 answers

Perl - If string contains text?

I want to use curl to view the source of a page and if that source contains a word that matches the string then it will execute a print. How would I do a if $string contains? In VB it would be like. dim string1 as string = "1" If…
Hellos
  • 601
  • 2
  • 6
  • 4
42
votes
7 answers

How can I match fuzzy match strings from two datasets?

I've been working on a way to join two datasets based on a imperfect string, such as a name of a company. In the past I had to match two very dirty lists, one list had names and financial information, another list had names and address. Neither had…
A L
  • 613
  • 1
  • 7
  • 7
40
votes
32 answers

Are Regular Expressions a must for programming?

Are Regular Expressions a must for doing programming?
nonopolarity
  • 146,324
  • 131
  • 460
  • 740
40
votes
2 answers

Most efficient way to check if $string starts with $needle in perl

Given two string variables $string and $needle in perl, what's the most efficient way to check whether $string starts with $needle. $string =~ /^\Q$needle\E/ is the closest match I could think of that does what is required but is the least…
Stephane Chazelas
  • 5,859
  • 2
  • 34
  • 31
39
votes
6 answers

php string matching with wildcard *?

I want to give the possibility to match string with wildcard *. Example $mystring = 'dir/folder1/file'; $pattern = 'dir/*/file'; stringMatchWithWildcard($mystring,$pattern); //> Returns true Example 2: $mystring = 'string bl#abla;y'; $pattern =…
dynamic
  • 46,985
  • 55
  • 154
  • 231
39
votes
4 answers

Finding how similar two strings are

I'm looking for an algorithm that takes 2 strings and will give me back a "factor of similarity". Basically, I will have an input that may be misspelled, have letters transposed, etc, and I have to find the closest match(es) in a list of possible…
Daniel Magliola
  • 30,898
  • 61
  • 164
  • 243
39
votes
3 answers

Pandas text matching like SQL's LIKE?

Is there a way to do something similar to SQL's LIKE syntax on a pandas text DataFrame column, such that it returns a list of indices, or a list of booleans that can be used for indexing the dataframe? For example, I would like to be able to match…
naught101
  • 18,687
  • 19
  • 90
  • 138
37
votes
3 answers

Using Java Regex, how to check if a string contains any of the words in a set ?

I have a set of words say -- apple, orange, pear , banana, kiwi I want to check if a sentence contains any of the above listed words, and If it does , I want to find which word matched. How can I accomplish this in Regex ? I am currently calling…
user193116
  • 3,498
  • 6
  • 39
  • 58
33
votes
1 answer

Efficient string matching in Apache Spark

Using an OCR tool I extracted texts from screenshots (about 1-5 sentences each). However, when manually verifying the extracted text, I noticed several errors that occur from time to time. Given the text "Hello there ! I really like Spark ❤️!", I…
mrtnsd
  • 347
  • 1
  • 4
  • 3
32
votes
6 answers

Algorithm to find out whether the matches for two Glob patterns (or Regular Expressions) intersect

I'm looking at matching glob-style patterns similar the what the Redis KEYS command accepts. Quoting: h?llo matches hello, hallo and hxllo h*llo matches hllo and heeeello h[ae]llo matches hello and hallo, but not hillo But I am not matching…
chakrit
  • 61,017
  • 25
  • 133
  • 162