Questions tagged [python-re]

Python library that provides regular expression matching operations similar to those found in Perl.

re is the Python built-in module to deal with regular-expressions. It offers an intuitive, high-level mechanism to match patterns on strings.

The main functions to use from this module are:

  • re.compile - this function takes a pattern and some possible flags and returns a Pattern object. This is mostly useful when using the same pattern in a loop - compile the pattern once before the loop, instead of at each iteration.

  • re.match - takes a pattern and a string (and possible flags) and tries to match the pattern from the beginning of the string. Returns a Match object.

  • re.search - similar to match, but searches anywhere in the string.

  • re.findall - similar to search, but returns a list with all matches found. The list contains strings rather than Match objects. When the pattern contains groups, the list will consist of tuples containing the groups of each match.

The re module also offers a regex-equivalent replacements for the built-in split - re.split - and replace - re.sub.

1981 questions
5
votes
5 answers

Python re find start and end index of group match

Python's re match objects have .start() and .end() methods on the match object. I want to find the start and end index of a group match. How can I do this? Example: >>> import re >>> REGEX = re.compile(r'h(?P[0-9]{3})p') >>> test = "hello h889p…
Neil
  • 3,020
  • 4
  • 25
  • 48
5
votes
2 answers

Python regex compile and search strings with numbers and words

I have three strings which have information of the street name and apartment number. "32 Syndicate street", "Street 45 No 100" and "15, Tom and Jerry Street" Here, "32 Syndicate street" -> {"street name": "Syndicate street", "apartment number":…
Srivatsan
  • 9,225
  • 13
  • 58
  • 83
5
votes
1 answer

Does an if re match & group capture in the same line?

Is there a way in Python to do an if re match & group capture in the same line? In PERL I would do it like this: my $line = "abcdef"; if ($line =~ m/ab(.*)ef/) { print "$1\n"; } output: badger@pi0: scripts $ ./match.py cd but the closest way…
badger
  • 53
  • 3
5
votes
4 answers

Regex to extract usernames/names from a string

I have strings that includes names and sometime a username in a string followed by a datetime stamp: GN1RLWFH0546-2020-04-10-18-09-52-563945.txt JOHN-DOE-2020-04-10-18-09-52-563946t64.txt DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt I want to…
Zaibi
  • 343
  • 1
  • 12
4
votes
2 answers

My open reading frame (ORF) finding code is not finding the longest ORF in the sequence

I am trying to code a function that finds the longest Open reading frame. However, in this one instance it is not locating the longest ORF and I cannot figure out why. This is the…
4
votes
2 answers

Substring Merge on a column with special characters

I want to merge two dfs over 'Name' columns. However, one of them can be substring or exactly equal to other column. For this, I have used df1 = pd.DataFrame( { 'Name': ['12,5mg/0,333ml(37,5mg/ml)', 'ad', 'aaa'], } ) df2 =…
icgncl
  • 41
  • 4
4
votes
4 answers

How to make sure optional parts of a pattern occure at least once?

How to make sure that part of the pattern (keyword in this case) is in the pattern you're looking for, but it can appear in different places. I want to have a match only when it occurs at least once. Regex: …
otbear
  • 53
  • 3
4
votes
6 answers

Is there a way to split a string on delimiters including colon(:) except when it involves time?

I am trying to split the string below on a number of delimiters including \n, comma(,), and colon(:) except when the colon is part of a time value. Below is my string: values = 'City:hell\nCountry:rome\nUpdate date: 2022-09-26 00:00:00' I have…
4
votes
2 answers

finding CDRs in NGS data

I have millions of sequences in fasta format and want to extract CDRs (CDR1, CDR2 and CDR3).I chose only one sequence as an example and tried to extract CDR1 but not able to extract…
shivam
  • 596
  • 2
  • 9
4
votes
3 answers

Get the regex match and the rest (none-match) from Python's re module

Does the re module of Python3 offer an in-build way to get the match and the rest (none-match) back? Here is a simple example: >>> import re >>> p = r'\d' >>> s = '1a' >>> re.findall(p, s) ['1'] The result I want is something like ['1', 'a'] or…
buhtz
  • 10,774
  • 18
  • 76
  • 149
4
votes
4 answers

Extract all numbers (int and floats) after specific word

Assuming I have the following string: str = """ HELLO 1 Stop #$**& 5.02‼️ 16.1 regex 5 ,#2.3222 """ I want to export all numbers , Whether int or float after the word "stop" with no case sensitive . so the…
yair_elmaliah
  • 107
  • 2
  • 8
4
votes
2 answers

How to extract markdown links with a regex?

I currently have the Python code for parsing markdown text in order to extract the content inside the square brackets of a markdown link along with the hyperlink. import re # Extract []() style links link_name = "[^]]+" link_url =…
4
votes
2 answers

Multiple regex substitutions using a dict with regex expressions as keys

I want to make multiple substitutions to a string using multiple regular expressions. I also want to make the substitutions in a single pass to avoid creating multiple instances of the string. Let's say for argument that I want to make the…
vonBulow
  • 77
  • 5
4
votes
3 answers

Difference between re.split(" ", string) and re.split("\s+", string)?

I'm currently studying regular expressions and have come across an inquiry. So the title of the question is what I'm trying to find out. I thought since \s represents a white space, re.split(" ", string) and re.split("\s+", string) would give out…
Sihwan Lee
  • 179
  • 2
  • 10
4
votes
2 answers

Splitting string by colon in Python

I have a reminder app and I need to split the time like 3:30 a.m.. I used re module but I failed. What I'm trying to do is split the time by colon in front of the words in list. But list has multiple words. Like a.m., am The program should try the…
user14009914