1

I wrote a script to standardize a bunch of values pulled from a data bank using (mostly) r.sub. I am having a hard time incorporating zfill to pad the numerical values at 5 digits.

Input

FOO5864BAR654FOOBAR

Desired Output

FOO_05864-BAR-00654_FOOBAR

Using re.sub I have so far

FOO_5864-BAR-654_FOOBAR

One option was to do re.sub w/ capturing groups for each possible format [i.e. below], which works, but I don't think that's the correct way to do it.

(\d)         sub   0000\1
(\d\d)       sub   000\1
(\d\d\d)     sub   00\1
(\d\d\d\d)   sub   0\1
physlexic
  • 826
  • 2
  • 9
  • 21
  • What's the problem with using zfill? I'm not sure why you're considering using regex when zfill exists... – wjandrea Sep 25 '19 at 21:00

1 Answers1

1

Assuming your inputs are all of the form letters-numbers-letters-numbers-letters (one or more of each), you just need to zero-fill the second and fourth groups from the match:

import re

s = 'FOO5864BAR654FOOBAR'
pattern = r'(\D+)(\d+)(\D+)(\d+)(\D+)'
m = re.match(pattern, s)
out = '{}_{:0>5}-{}-{:0>5}_{}'.format(*m.groups())
print(out)  # -> FOO_05864-BAR-00654_FOOBAR

You could also do this with str.zfill(5), but the str.format method is just much cleaner.

wjandrea
  • 28,235
  • 9
  • 60
  • 81