Questions tagged [python-re]

Python library that provides regular expression matching operations similar to those found in Perl.

re is the Python built-in module to deal with regular-expressions. It offers an intuitive, high-level mechanism to match patterns on strings.

The main functions to use from this module are:

  • re.compile - this function takes a pattern and some possible flags and returns a Pattern object. This is mostly useful when using the same pattern in a loop - compile the pattern once before the loop, instead of at each iteration.

  • re.match - takes a pattern and a string (and possible flags) and tries to match the pattern from the beginning of the string. Returns a Match object.

  • re.search - similar to match, but searches anywhere in the string.

  • re.findall - similar to search, but returns a list with all matches found. The list contains strings rather than Match objects. When the pattern contains groups, the list will consist of tuples containing the groups of each match.

The re module also offers a regex-equivalent replacements for the built-in split - re.split - and replace - re.sub.

1981 questions
4
votes
4 answers

What is the meaning of "Empty matches are included in the result."?

I am referring to the documentation of the re.findall function: What is the meaning of "Empty matches are included in the result."?
variable
  • 8,262
  • 9
  • 95
  • 215
4
votes
2 answers

Named backreference (?P=name) issue in Python re

I am learning 're' part of Python, and the named pattern (?P=name) confused me, When I using re.sub() to make some exchange for digit and character, the patter '(?P=name)' doesn't work, but the pattern '\N' and '\g' still make sense. Code…
KE LI
  • 43
  • 3
3
votes
2 answers

How to capture words with letters separated by a consistent symbol in Python regex?

I am trying to write a Python regex pattern that will allow me to capture words in a given text that have letters separated by the same symbol or space. For example, in the text "This is s u p e r and s.u.p.e.r and super and s!u.p!e.r", my goal is…
Billy Bonaros
  • 1,671
  • 11
  • 18
3
votes
2 answers

How to get list of possible replacements in string using regex in python?

I have the following strings: 4563_1_some_data The general pattern is r'\d{1,5}_[1-4]_some_data Note, that numbers before first underscore may be the same for different some_data So the question is: how to get all possible variants of replacement…
3
votes
5 answers

separate the abnormal reads of DNA (A,T,C,G) templates

I have millions of DNA clone reads and few of them are misreads or error. I want to separate the clean reads only. For non biological background: DNA clone consist of only four characters (A,T,C,G) in various permutation/combination. Any character,…
shivam
  • 596
  • 2
  • 9
3
votes
3 answers

How can I match a pattern, and then everything upto that pattern again? So, match all the words and acronyms in my below example

Context I have the following paragraph: text = """ בביהכנ"ס - בבית הכנסת דו"ח - דין וחשבון הת"ד - התיקוני דיקנא בגו"ר - בגשמיות ורוחניות ה"א - ה' אלוקיכם התמי' - התמיהה בהנ"ל - בהנזכר לעיל ה"א - ה' אלקיך ואח"כ - ואחר כך בהשי״ת - בהשם יתברך ה"ה -…
MendelG
  • 14,885
  • 4
  • 25
  • 52
3
votes
3 answers

How to map and replace a pandas column with a dictionary

I am new to programming and specially regex. I have encountered a problem mapping a dictionary items to a pandas dataframe column. A Minimal reproducible example would be as following (my original dataset is a large one): my csv file looks…
shir13
  • 43
  • 3
3
votes
1 answer

python regex lookbehind to remove _sublabel1 in string like "__label__label1_sublabel1"

i have dataset that prepare for train in fasttext and i wanna remove sublabels from dataset for example: __label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data. Any help much…
3
votes
2 answers

Regular Expression to split text based on different patterns (within a single expression)

I have some patterns which detect questions and splits on top of that. there are some assumptions which I'm using like: Every pattern starts with a \n Every pattern ends with \s+ And how I define a pattern is like: . Q . Q…
Deshwal
  • 3,436
  • 4
  • 35
  • 94
3
votes
1 answer

Return certain character or word followed or proceeded by space- Regex Python

Try to select only the size of clothes using regex expression So I am new to Python and I trying to select rows find these sizes but gets confused with other words. I using regex expression but failed to obtain the desired result. Code: df =…
Sigmoid
  • 33
  • 5
3
votes
1 answer

Why does the regex "[a-z]" match against the non-ASCII characters "İıſK" when the case-insensitive flag is used?

The following Python code (version 3.11.0) gives an unexpected result: import re import sys s = ''.join(map(chr, range(sys.maxunicode + 1))) matches = ''.join(re.findall('[a-z]', s, re.IGNORECASE)) print(matches) It prints the extra 4 non-ASCII…
Wood
  • 271
  • 1
  • 8
3
votes
1 answer

How to get all the indexes of leading zeroes using regex in python

Using Regex in Python (library re (only)), I want to create a function that gives me the position of all leading 0s in a string. For example, if the string was: My house has 01 garden and 003 rooms. I would want me the function to return 13, 27 and…
Katharina Böhm
  • 125
  • 1
  • 8
3
votes
2 answers

How to use `re.findall` to extract data from string

I have the following string (file): s = ''' \newcommand{\commandName1}{This is first command} \newcommand{\commandName2}{This is second command with {} brackets inside in multiple lines {} {} } \newcommand{\commandName3}{This is third, last…
Jason
  • 313
  • 2
  • 8
3
votes
2 answers

Split a text by specific word or phrase and keep the word in Python

Is there any elegant way of splitting a text by a word and keep the word as well. Although there are some works around split with re package and pattern like (Python RE library String Split but keep the delimiters/separators as part of the next…
Sam S.
  • 627
  • 1
  • 7
  • 23
3
votes
2 answers

Find the string value from a list in a dataframe column and append the string value as a column

I have a list of names and a dataframe with a column of free form text. I am trying to scan through the column of text and if it contains a string from the list then append the string as an additional column on the data frame. I have only found ways…