Looping through python regex matches

Question

I want to turn a string that looks like this:

ABC12DEF3G56HIJ7

into

12 * ABC
3  * DEF
56 * G
7  * HIJ

I want to construct the correct set of loops using regex matching. The crux of the issue is that the code has to be completely general because I cannot assume how long the [A-Z] fragments will be, nor how long the [0-9] fragments will be.

`''.join("%s * %s\n" % (n, w) for w, n in re.findall(r'(?i)([a-z]+)(\d+)', input_string))` — jfs, Oct 13 '12 at 05:21

score 148 · Accepted Answer · edited Sep 16 '17 at 17:14

148

Python's re.findall should work for you.

Live demo

import re

s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')

for (letters, numbers) in re.findall(pattern, s):
    print(numbers, '*', letters)

edited Sep 16 '17 at 17:14

Fabien Sa

9,135
4
37
44

answered Oct 13 '12 at 05:20

Ray Toal

86,166
18
182
232

score 93 · Answer 2 · edited Oct 23 '20 at 15:25

93

It is better to use re.finditer if your dataset is large because that reduces memory consumption (findall() return a list of all results, finditer() finds them one by one).

import re

s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')

for m in re.finditer(pattern, s):
    print m.group(2), '*', m.group(1)

edited Oct 23 '20 at 15:25

snwflk

3,341
4
25
37

answered Jul 28 '16 at 01:11

Mithril

12,947
18
102
153

If I'm not mistaken, the last line of this example should be `print m.group(2), '*', m.group(1)` to fit the OP's desired output. I believe that `m.group(0)` is the 'full' match--i.e., ABC12, DEF3, G56, HIJ7. – DaveL17 Jun 08 '17 at 02:19
@DaveL17 You are right, thanks. I didn't think much while write this answer, fixed now. – Mithril Jun 08 '17 at 03:14
3

This method has the benefit of letting you access named groups by name, rather than by location in the regular expression (which might change if the patterns are moved in the regular expression.) – Carl G Nov 21 '17 at 20:34
Why is that better? – Jann Poppinga Oct 23 '20 at 07:25
@Jann Poppinga reduce memory usage. `findall` get all result back, `finditer` get one by one . – Mithril Oct 23 '20 at 07:40

score 1 · Answer 3 · answered Jul 10 '23 at 00:39

1

A bit simpler one liner would be

print(re.sub(r"([A-Z]+)(\d+)", r'\2 * \1\n', s))

answered Jul 10 '23 at 00:39

Dabble

39
2

score 0 · Answer 4 · answered Apr 08 '23 at 05:59

0

Yet another option could be to use re.sub() to create the desired strings from the captured groups:

import re
s = 'ABC12DEF3G56HIJ7'
for x in re.sub(r"([A-Z]+)(\d+)", r'\2 * \1,', s).rstrip(',').split(','):
    print(x)

12 * ABC
3 * DEF
56 * G
7 * HIJ

answered Apr 08 '23 at 05:59

cottontail

10,268
18
50
51

Looping through python regex matches

4 Answers4

Linked