Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
0
votes
1 answer

select the same rows from a column between two spreadsheets or two columns

I have thousands of entries for the same site names taken between different days. However not all row labels are identical. I just want to select all row labels that are shared among both spreadsheets based on the names contained in column A for…
0
votes
2 answers

How do I use boolean logic to make this if-statement more concise in python 3?

I want to extract job titles that consist of either "back-end", "back end" or "backend" from json data of a website. I managed to do so with the following code: if "back-end" in jobtitle.lower(): print(jobtitle) if "back end" in…
0
votes
1 answer

python string matching using recordlinkage - possibility to write rules for specific cases

I am using python's recordlinkage toolkit to string match school name columns from two dataframes, df1 and df2, while blocking on their common column 'division'. My code is as below: import recordlinkage from recordlinkage.standardise import…
ejcho
  • 19
  • 4
0
votes
2 answers

how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

note that the final two numbers of this pattern for example FBXASC048 are ment to be ascii code for numbers (0-9) input example list ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human'] result example ['1009Car', '5002Toy', '2004Human'] what…
RedBeard
  • 3
  • 1
0
votes
1 answer

Swift: How to search for keywords in a sentence

I am trying to do a keyword search in a sentence with swift. for example given keywords = ["black", "bag", "love", "filled"] Sentence1 = "There is a black bag in a house filled with love" Sentence2 = "We are in a shop. There is a black bag on the…
e.iluf
  • 1,389
  • 5
  • 27
  • 69
0
votes
2 answers

How to check array of strings to match with reference string in php?

I have millions of string data in mysql table and have to cross check with the string one by one with the table, if matches return true or else false. I tried with simple preg_match as below, which is consuming more memory and time.
0
votes
2 answers

Find all possible matches (in any order/sequence) of an input string to a list of tuples in Python

I want to match an input string to a list of tuples and find out the top N close matches from the list of tuples. The list of tuple has around 2000 items. The problem I am facing is that I have used fuzzywuzzy process.extract method but it returns a…
Erich
  • 87
  • 6
0
votes
0 answers

Ways to improve Fuzzy Algorithm?

I'm running a fuzzywuzzy algorithm to compare two large sets of strings against one another. The strings are company names from two different data sources and I find this to be unique in that there are a lot of matches that look intuitive but are…
IcedDante
  • 6,145
  • 12
  • 57
  • 100
0
votes
1 answer

Rowwise extract common substrings from to columns in a data frame

I want to match cities with regions in a data frame. The columns are a little bit messy, so I would like to extract the names of the cities / regions that appear in two columns as in the following example. A <- c("Berlin", "Hamburg", …
Rami Al-Fahham
  • 617
  • 1
  • 6
  • 10
0
votes
0 answers

Putting a complex string through a regex match and enforcing number of repeats

I'm struggling a little with my regex. It's matching but I would like to have between 1 and 3 repeating patterns. The variation of strings I am looking to match against: D1000_U400_Mbps_TC4_P, D2_U2_Mbps_TC1_C,…
Sash
  • 1,134
  • 11
  • 23
0
votes
1 answer

Difference string matching

With string matching, you look for exact matches. There are algorithms that account for up to k binary differences included omission of a character, the addition of a character, or replacement of a character (forgot the algorithm name), in O(n) time…
Tobi Akinyemi
  • 804
  • 1
  • 8
  • 24
0
votes
2 answers

string matching with NLP

I have two dataframes, df1 and df2, with ~40,000 rows and ~70,000 rows respectively of data about polling stations in country A. The two dataframes have some common columns like 'polling_station_name', 'province', 'district' etc., however df1 has…
dmswjd
  • 43
  • 7
0
votes
1 answer

Separating out 'While' function

I currently have defined my object (is_matched) that contains a while loop and also the checking of a string. I am trying to pull out the while loop into it's own object but am having trouble completing the coding. I also need the while loop to…
0
votes
4 answers

Optamise way to match a word in javascript?

I am trying to match word in javascript if i use 'if' and 'split' method each time I stuck in the case-sensetive. If I am using regExp then if a part of word matched it returned true (i.e : hii , /hi/). What I do do to match case-insensitive and…
Anurag Kumar
  • 129
  • 1
  • 7
0
votes
1 answer

Full text address matching

I'm looking for duplicate records. I have a Property table with the fields street, number, city, state, county and zip. They get geo-coded based on location, but there are some holes in the data. Problem is if they make a simple typing error or omit…
whoblitz
  • 1,065
  • 1
  • 11
  • 17
1 2 3
99
100