Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
8
votes
4 answers

String pattern matching with one or zero mismatch

Given a string and a pattern to be matched, how efficiently can the matches be found having zero or one mismatch. e.g) S = abbbaaabbbabab P = abab Matches are abbb(index 0),aaab(index 4),abbb(index 6),abab(index 10) I tried to modify KMP…
7
votes
1 answer

android < 2.3 and java.text.Normalizer

What's the best alternative to to java.text.Normalizer in android versions previous than 2.3? http://developer.android.com/reference/java/text/Normalizer.html I need to match Strings like perché perchè perche thanks Nicola
Nicola Montecchio
  • 477
  • 1
  • 7
  • 18
7
votes
1 answer

Fuzzy Match Across Columns in R

How can I measure the degree to which names are similar in r? In other words, the degree to which a fuzzy match can be made. For example, I am working with a data frame that looks like this: Name.1 <- c("gonzalez", "wassermanschultz",…
Sharif Amlani
  • 1,138
  • 1
  • 11
  • 25
7
votes
3 answers

How to split a string after every 10 words?

I looking for a way to split my chunk of string every 10 words. I am working with the below code. My input will be a long string. Ex: this is an example file that can be used as a reference for this program, i want this line to be split (newline) by…
Samantha1154
  • 117
  • 9
7
votes
2 answers

Efficient string suffix detection

I am working with PySpark on a huge dataset, where I want to filter the data frame based on strings in another data frame. For example, dd =…
Sotos
  • 51,121
  • 6
  • 32
  • 66
7
votes
2 answers

Fuzzy record matching with multiple columns of information

I have a question that is somewhat high level, so I'll try to be as specific as possible. I'm doing a lot of research that involves combining disparate data sets with header information that refers to the same entity, usually a company or a…
7
votes
2 answers

How can I get the precise common "max.distance" value for fuzzy string matching using agrep?

I am trying to figure out the best precision for fuzzy string matching between two string names using agrep. However, I will need to choose one precision "max.distance" to apply the same across all strings I am trying to match since the amount of…
Eric
  • 528
  • 1
  • 8
  • 26
7
votes
2 answers

Remove all non-digit characters from a string jquery?

Possible Duplicate: Javascript: strip out non-numeric characters from string String matching is headache for me. Example: If I have strings like these: abc123xyz456()* ^%$111u222 Then convert it to: 123456 111222
Awan
  • 18,096
  • 36
  • 89
  • 131
7
votes
4 answers

elegant way to match two wildcarded strings

I'm OCRing some text from two different sources. They can each make mistakes in different places, where they won't recognize a letter/group of letters. If they don't recognize something, it's replaced with a ?. For example, if the word is…
Claudiu
  • 224,032
  • 165
  • 485
  • 680
7
votes
2 answers

Find matching strings between two vectors in R

I have two vectors in R. I want to find partial matches between them. My Data The first one is from a dataset named muc, which contains 6400 street names. muc$name looks like: muc$name = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" ,…
7
votes
3 answers

How to predict correct country name for user provided country name?

I am planning to do some data tuning on my data. Situation-I have a data which has a field country. It contains user input country names( It might contain spelling mistakes or different country names for same country like US/U.S.A/United States for…
AngryLeo
  • 390
  • 4
  • 23
7
votes
4 answers

Find exact match in list of strings

very new to this so bear with me please... I got a predefined list of words checklist = ['A','FOO'] and a words list from line.split() that looks something like this words = ['fAr', 'near', 'A'] I need the exact match of checklist in words, so I…
origamisven
  • 83
  • 1
  • 1
  • 4
7
votes
1 answer

How to compare and convert emoji characters in C#

I am trying to figure out how to check if a string contains a specfic emoji. For example, look at the following two emoji: Bicyclist: http://unicode.org/emoji/charts/full-emoji-list.html#1f6b4 US Flag:…
tbraun
  • 115
  • 1
  • 3
  • 4
7
votes
2 answers

how to check if a word appears as a whole word in a string in Lua

not sure how to check if a word appears as a whole word in a string, not part of a word, case sensitive. for example: Play is in strings Info Playlist Play pause but not in the strings Info Playlist pause Info NowPlay pause
mile
  • 402
  • 1
  • 6
  • 15
7
votes
2 answers

String conditional formatting "equal to" in Excel using Python's xlsxwriter

I have relatively big Excel spreadsheets, where I am applying conditional formatting. However, the content of a cell is relatively short (max 3 letters). So, I need to match exactly a string. For example: 'A' should be formatted but nothing more…