Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
7
votes
2 answers

bash script to check file name begins with expected string

Running on OS X with a bash script: sourceFile=`basename $1` shopt -s nocasematch if [[ "$sourceFile" =~ "adUsers.txt" ]]; then echo success ; else echo fail ; fi The above works, but what if the user sources a file called adUsers_new.txt? I…
chop
  • 461
  • 1
  • 6
  • 16
7
votes
1 answer

How to search for a word in a text file and if found print out the entire line

My program needs to search for a word in a text file, and if it finds that word, to print out/display the entire line. Example: employee name date joined position project annual salary tom jones 1/13/2011 …
Darius
  • 85
  • 2
  • 4
  • 12
7
votes
1 answer

Probabalistic String Matching in Python

I'm in the process of writing a bot that places bets on the website Betfair using their Python API. I want to place bets on football (soccer) matches when they are in-play. I've coded an XML feed to give me live data from the games, however the XML…
7
votes
3 answers

Algorithm to match one input file with given numbers of file

I had an interview last week. I was stuck in one of the question in algorithm round. I answered that question, but the interviewer did not seem convinced. That's why I am sharing the same. Please tell me any optimized method for this question, so…
devsda
  • 4,112
  • 9
  • 50
  • 87
7
votes
2 answers

How does SequenceMatcher.ratio works in difflib

I was trying out python's difflib module and I came across SequenceMatcher. So, I tried the following examples but couldn't understand what is happening. >>> SequenceMatcher(None,"abc","a").ratio() 0.5 >>>…
RanRag
  • 48,359
  • 38
  • 114
  • 167
7
votes
3 answers

Java: Does anyone have method to find best match of string in array?

Basically I'm just trying to find a way to find the closest match (not necessarily exact) of a String For example, find "delicous" in {"pie", "delicious", "test"} This is pretty obvious, but the values in the array might not always be that…
Alex Coleman
  • 7,216
  • 1
  • 22
  • 31
6
votes
9 answers

How to search a string of key/value pairs in Java

I have a String that's formatted like this: "key1=value1;key2=value2;key3=value3" for any number of key/value pairs. I need to check that a certain key exists (let's say it's called "specialkey"). If it does, I want the value associated with it. If…
user438293456
  • 596
  • 3
  • 6
  • 19
6
votes
6 answers

Algorithm to match 2 lists with wildcards

I'm looking for an efficient way to match 2 lists, one wich contains complete information, and one which contains wildcards. I've been able to do this with wildcards of fixed lengths, but am now trying to do it with wildcards of variable…
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
6
votes
2 answers

Shortest Repeating Sub-String

I am looking for an efficient way to extract the shortest repeating substring. For example: input1 = 'dabcdbcdbcdd' ouput1 = 'bcd' input2 = 'cbabababac' output2 = 'ba' I would appreciate any answer or information related to the problem. Also, in…
TimC
  • 305
  • 1
  • 3
  • 9
6
votes
6 answers

TCL string match vs regexps

Is it right that we should avoid using regexp as it is slow. Instead we should use string operations. Are there cases that both can be used but regexp is better?
Narek
  • 38,779
  • 79
  • 233
  • 389
6
votes
2 answers

How do I match string till end of text file?

In this line of code I want it to match the string from 'Review Notes \50optional\51' till the end of the text file. How can I do this? reviewNotes = contents.match(/Review Notes \50optional\51\n==================(.*?)/m)[1].strip
thisiscrazy4
  • 1,905
  • 8
  • 35
  • 58
6
votes
3 answers

Finding best matched word from large Vocalist

I have a pandas data frame that contains two columns named Potential Word, Fixed Word. The Potential Word column contains words of different languages which contains spell mistakes words and correct words and the Fixed Word column contains the…
6
votes
1 answer

findSubstrings and breakSubstring in Data.ByteString

In the source of Data/ByteString.hs it says that the function findSubstrings has been deprecated in favor of breakSubstring. However I think the findSubstrings which was implemented using the KMP algorithm is much more efficient than the algorithm…
sob7y
  • 61
  • 1
6
votes
2 answers

General method to trim non-printable characters in Clojure

I encountered a bug where I couldn't match two seemingly 'identical' strings together. For example, the following two strings fail to match: "sample" and "​sample". To replicate the issue, one can run the following in Clojure. (= "sample" "​sample")…
6
votes
5 answers

Splitting a String using Word Delimiters

i have a string as below a > b and c < d or d > e and f > g outcome must be: a > b and c < d or d > e and f > g i want to split the string at occurrences of "and" , "or" and retrieve the delims as well along with the token.[i need them in order…
jch
  • 1,155
  • 3
  • 14
  • 27