Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

There are two types of string matching:

  • Exact
  • Approximate

Exact string matching is the problem of finding occurrence(s) of a pattern string within another string or body of text. (NIST). For example, finding CGATCGATTA in CTAGATCCTGCGATCGATTAAGCCTGA.

A comprehensive online reference of string matching algorithms is Exact String Matching Algorithms by Christian Charras and Thierry Lecroq.

Approximate string matching, also called fuzzy string matching, searches for matches based on the edit distance between the pattern and the text.

2278 questions
0
votes
1 answer

How to manipulate multibyte string in python?

I have a log file having multibyte data in it (). I want to write a script that does some data manipulation on it. with open(fo, encoding="cp1252") as file: for line in file: print(line) if("WINDOWS" in line): …
0
votes
1 answer

string matching - best distance algorithm to use

I have two dataframes, df1 and df2, that have information about polling stations. The dataframes are of different lengths. Both dataframes have a column called ps_name, which is the name of the polling stations, and a column called district that…
dmswjd
  • 43
  • 7
0
votes
2 answers

Most appropriate function for string matching

I am trying to find the most suitable method (that I could use in VBA) in order to compare thousands of records from column A to data in column B. The example of the data can be seen below: Column1 Column2 Modra Digest…
JAYK
  • 15
  • 4
0
votes
1 answer

Excel Sum Index Match Across Multiple columns

I am having significant issues trying to resolve my problem. Essentially I need an excel formula that replicates a SUMIFS function, as it appears that sumifs doesn't work in my scenario. Effectively I need to SUM across a horizontal axis, based on…
0
votes
3 answers

I wrote a program to get the count of matching adjacent alphabets in a string

For the same 1 of the test cases have passed while all the other had failed. The failed ones test cases were of very long strings. but could not understand where did I go wrong. The number of test cases and string is been read in the main function,…
Saranya
  • 69
  • 1
  • 4
0
votes
2 answers

PostgreSQL: Extract one word after a keyword from string

I have a column in Postgres like below and I want to split only the character that comes next to the keyword. In the below case the keyword is "Patient" Patient Mark has tested positive New update for Patient Wilson Discharged - Patient…
Vivo
  • 47
  • 1
  • 7
0
votes
1 answer

Using IF statement for PARTIAL match of input filename in snakemake to run different shell commands

I have 120 fastq files which have been sequenced in different locations, and I need to trim the reads in each file down to a specific length. The final length required for the reads in each file differs and this depends on the location it where it…
Darren
  • 277
  • 4
  • 17
0
votes
1 answer

Partial matches in 2 columns following exact match

I need to do an exact match followed by a partial match and retrieve the strings from two columns. I would ideally like to do this with awk. Input: k141_18046_1 k141_18046_1 k141_18046_1 k141_18046_2 k141_18046_2 k141_18046_1 k141_12033_1 …
Susheel Busi
  • 163
  • 8
0
votes
1 answer

Match exact string in dataframe column

How do I find an exact match of a string in a dataframe column? Currently, I am searching col_8 for the string '1' and it is returning True for row #2, but I want it to be false because it is actually 12. Thanks in advance! df['new'] =…
Shawn Schreier
  • 780
  • 2
  • 10
  • 20
0
votes
0 answers

How to transfer each line of a text file to Excel cell?

I need to transfer some pdf table content to Excel. I used the PyMuPDF module to be able to put the PDF content to a .txt file. In which it is easier to work with and I did it successfully. As you can see in the .txt file I was able to transfer…
0
votes
1 answer

keeping the best string matched by fuzzy matching in R

I have two dataframes in R. one a dataframe of the phrases I want to match along with their synonyms in another column (df.word), and the other a data frame of the strings I want to match along with codes (df.string). The strings are complicated but…
ayeh
  • 48
  • 10
0
votes
4 answers

Python - Printing a specific pattern

I have three lists and one string variable: var = "http:/domain.com" directories = ['dir_A', 'dir_B', 'dir_C'] files = ['file_A', 'file_B', 'file_C'] extensions = ['ext_A', 'ext_B'] I want to print a pattern EXACTLY like…
Muhammad
  • 37
  • 4
0
votes
1 answer

Prevent repeating special character and by specific count using RegEx For Email

How can I use RegEx to test for the following pattern: String length doesn't matter. The special character sign (-) should not be repeated consecutively. The special character sign (-) should not occur more than twice in the entire string and (.)…
Jason
  • 325
  • 2
  • 4
  • 12
0
votes
6 answers

C# String Pattern Matching

I have 2 lists of strings in C# showing the players who have joined and left a particular game. I'm trying to attempt to determine who is still in the game by matching both lists and eliminating the entries of those who have left the game. Please…
paradox
  • 1,248
  • 5
  • 20
  • 32
0
votes
2 answers

R- Column match, create new column with another column of corresponding value

I have two data frame: df1<- data.frame(place=c("KARACA ADANA","ASIL BOLU","GAZIANTEP","YUKARI/MERSIN")) df2<- data.frame(city=c("ADANA","BOLU","ANTEP","MERSIN"), neighbor=c("KARACA","ASIL","GAZI","YUKARI")) I need to match columns df1$place and…
genco
  • 35
  • 6
1 2 3
99
100