98

I want to turn a string that looks like this:

ABC12DEF3G56HIJ7

into

12 * ABC
3  * DEF
56 * G
7  * HIJ

I want to construct the correct set of loops using regex matching. The crux of the issue is that the code has to be completely general because I cannot assume how long the [A-Z] fragments will be, nor how long the [0-9] fragments will be.

cottontail
  • 10,268
  • 18
  • 50
  • 51
da5id
  • 1,039
  • 1
  • 9
  • 6
  • 4
    `''.join("%s * %s\n" % (n, w) for w, n in re.findall(r'(?i)([a-z]+)(\d+)', input_string))` – jfs Oct 13 '12 at 05:21

4 Answers4

148

Python's re.findall should work for you.

Live demo

import re

s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')

for (letters, numbers) in re.findall(pattern, s):
    print(numbers, '*', letters)
Fabien Sa
  • 9,135
  • 4
  • 37
  • 44
Ray Toal
  • 86,166
  • 18
  • 182
  • 232
93

It is better to use re.finditer if your dataset is large because that reduces memory consumption (findall() return a list of all results, finditer() finds them one by one).

import re

s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')

for m in re.finditer(pattern, s):
    print m.group(2), '*', m.group(1)
snwflk
  • 3,341
  • 4
  • 25
  • 37
Mithril
  • 12,947
  • 18
  • 102
  • 153
  • If I'm not mistaken, the last line of this example should be `print m.group(2), '*', m.group(1)` to fit the OP's desired output. I believe that `m.group(0)` is the 'full' match--i.e., ABC12, DEF3, G56, HIJ7. – DaveL17 Jun 08 '17 at 02:19
  • @DaveL17 You are right, thanks. I didn't think much while write this answer, fixed now. – Mithril Jun 08 '17 at 03:14
  • 3
    This method has the benefit of letting you access named groups by name, rather than by location in the regular expression (which might change if the patterns are moved in the regular expression.) – Carl G Nov 21 '17 at 20:34
  • Why is that better? – Jann Poppinga Oct 23 '20 at 07:25
  • @Jann Poppinga reduce memory usage. `findall` get all result back, `finditer` get one by one . – Mithril Oct 23 '20 at 07:40
1

A bit simpler one liner would be

print(re.sub(r"([A-Z]+)(\d+)", r'\2 * \1\n', s))
Dabble
  • 39
  • 2
0

Yet another option could be to use re.sub() to create the desired strings from the captured groups:

import re
s = 'ABC12DEF3G56HIJ7'
for x in re.sub(r"([A-Z]+)(\d+)", r'\2 * \1,', s).rstrip(',').split(','):
    print(x)

12 * ABC
3 * DEF
56 * G
7 * HIJ
cottontail
  • 10,268
  • 18
  • 50
  • 51