Questions tagged [string-search]

String searching algorithms (also known as string matching algorithms) are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.

String searching algorithms (also known as string matching algorithms) are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.

Use this tag for programming questions related to string searching algorithms.

Source: Wikipedia

261 questions
4
votes
1 answer

Computing the second (mis-match) table in the Boyer-Moore String Search Algorithm

For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality...…
PythonPower
4
votes
5 answers

Print Different Output Values Corresponding to Duplicate Input in a Table?

For example, TableA: ID1 ID2 123 abc 123 def 123 ghi 123 jkl 123 mno 456 abc 456 jkl I want to do a string search for 123 and return all corresponding values. pp = Cases[#,…
Rose
  • 129
  • 6
4
votes
1 answer

Is time Complexity O(nm) equal to O(n^2) if m <= n

I am studying the time complexity of an algorithm to find if a string n contains a substring m where m <= n. The result of the analysis is that the time complexity is O(nm). So taking this time complexity as starting point and knowing that m <= n,…
abnerabbey
  • 125
  • 1
  • 10
4
votes
0 answers

Comparison of C++17 string search algorithms

C++17 added specialized string search algorithms: std::boyer_moore_horspool_searcher std::boyer_moore_searcher std::default_searcher To quote wikipedia on the Boyer–Moore–Horspool algorithm: It is a simplification of the Boyer–Moore string search…
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
4
votes
2 answers

Python Pandas: Lookup table by searching for substring

I have a dataframe with a column for app user-agents. What I need to do is to identify the particular app from this column. For example, NewWordsWithFriendsFree/2.3 CFNetwork/672.1.15 Darwin/14.0.0 will be categorized in Words With Friends.…
sfactor
  • 12,592
  • 32
  • 102
  • 152
4
votes
4 answers

Search pattern to include square brackets

I am trying to search for exact words in a file. I read the file by lines and loop through the lines to find the exact words. As the in keyword is not suitable for finding exact words, I am using a regex pattern. def findWord(w): return…
BitsNPieces
  • 91
  • 1
  • 7
4
votes
3 answers

Search string in a file in c

I am trying to write a program that can search a string in a file (called student.txt). I want my program to print the word if it finds the same word in the file, but its showing error. #include #include #include int…
jimo
  • 430
  • 2
  • 7
  • 19
4
votes
3 answers

String Occurrence Counting Algorithm

I am curious what is the most efficient algorithm (or commonly used) to count the number of occurrences of a string in a chunk of text. From what I read, the Boyer–Moore string search algorithm is the standard for string searches but I am not sure…
Hellnar
  • 62,315
  • 79
  • 204
  • 279
4
votes
8 answers

Searching for a word in a text using C, and display the info after that word

Say I have a text file like this: User: John Device: 12345 Date: 12/12/12 EDIT: I have my code to successfully search for a word, and display the info after that word. However when I try to edit the code to search for 2 or 3 words and display the…
Dave Wang
  • 109
  • 4
  • 12
3
votes
3 answers

Objective-C jumbled letters solver

I am trying to create this app on the iphone that given 6 letters, it would output all the possible 3-6 letter english words. I already have a dictionary, I just want to know how to do it. I searched around and only found those scrabble solvers in…
kazuo
  • 297
  • 1
  • 6
  • 15
3
votes
4 answers

How to return a string if a re.findall finds no match

I am writing a script to take scanned pdf files and convert them into lines of text to enter into a database. I use re.findall to get matches from a list of regular expressions to get certain values from the tesseract extracted strings. I am…
Matthew Keith
  • 53
  • 1
  • 1
  • 4
3
votes
4 answers

Python: finding a common sublist of a given length present in two lists

I have to find an efficient python code to do the following: Find, at least one if existing, sequence of n consecutive elements which is included in the two given lists. For example, with n=3, the result for these two lists will be ['Tom', 'Sam',…
dsv2kx
  • 31
  • 1
3
votes
6 answers

For string, find and replace

Finding some text and replacing it with new text within a C string can be a little trickier than expected. I am searching for an algorithm which is fast, and that has a small time complexity. What should I use?
user319824
3
votes
1 answer

Why time complexity of brute force algorithm is O(n*m)?

I am using the following brute force algorithm for searching a string inside another string. As I know, the number of comparisons is (n-m+1)*m in the worst case, but the right answer for time complexity is supposed to be O(n*m). To get this answer,…
3
votes
1 answer

Compare two large string vectors takes too long time (remove stopwords)

I am trying to prepare a dataset for machine learning. In the process I would like to remove (stop) words which has few occurrences (often related to bad OCR readings). Currently I have a list of words containing approx 1 mio words which I want to…
1 2
3
17 18