6

I have to parse some numbers from file names that have no common logic. I want to use the python way of "try and thou shall be forgiven", or try-except structure. Now I have to add more than two cases. What is the correct way of doing this? I am now thinking either nested try's or try-except-pass, try-except-pass,... Which one would be better or something else? Factory method perhaps (how?)?

This has to be easily expandable in the future as there will be much more cases.

Below is what I want (does not work because only one exeption per try can exist):

try:
    # first try
    imNo = int(imBN.split('S0001')[-1].replace('.tif',''))
except:
    # second try
    imNo = int(imBN.split('S0001')[-1].replace('.tiff',''))
except:
    # final try
    imNo = int(imBN.split('_0_')[-1].replace('.tif',''))

Edit:

Wow, thanks for the answers, but no pattern matching please. My bad, put "some common logic" at the beginning (now changed to "no common logic", sorry about that). In the cases above patterns are pretty similar... let me add something completely different to make the point.

except:
    if imBN.find('first') > 0: imNo = 1
    if imBN.find('second') > 0: imNo = 2
    if imBN.find('third') > 0: imNo = 3
    ...
Juha
  • 2,053
  • 23
  • 44

2 Answers2

10

You can extract the common structure and make a list of possible parameters:

tries = [
    ('S0001', '.tif'),
    ('S0001', '.tiff'),
    ('_0_', '.tif'),
]

for sep, subst in tries:
    num = imBN.split(sep)[-1].replace(subst, '')
    try:
        imNo = int(num)
        break
    except ValueError:
        pass
else:
    raise ValueError, "String doesn't match any of the possible patterns"

Update in reaction to question edit

This technique can easily be adapted to arbitrary expressions by making use of lambdas:

def custom_func(imBN):
    if 'first' in imBN: return 1
    if 'second' in imBN: return 2

tries = [
    lambda: int(imBN.split('S0001')[-1].replace('.tif','')),
    lambda: int(imBN.split('S0001')[-1].replace('.tiff','')),
    lambda: int(imBN.split('_0_')[-1].replace('.tif','')),
    lambda: custom_func(imBN),
]

for expr in tries:
    try:
        result = expr()
        break
    except:
        pass
else:
    # error
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • 1
    +1, I was just finishing up to post the same, but you beat me to it ;-) – ChristopheD Apr 25 '12 at 14:37
  • 1
    Instead of checking `imNo is None` afterwards to raise, you could also add an `else` clause to the loop (which executes if the loop finished without breaking). Then you don't even need the `imNo = None` line to start with. :) – Danica Apr 25 '12 at 14:45
  • @Dougal: Nice, I always forget about these. – Niklas B. Apr 25 '12 at 14:45
  • Thanks for a great answer (to original question). Can you modify the `tries` list to match the edit? – Juha Apr 25 '12 at 15:05
  • @Juha: BTW, you don't need the `.find() > 0` idiom in Python, we have the nicer `'substr' in s`. – Niklas B. Apr 25 '12 at 15:10
3

In your specific case, a regular expression will get rid of the need to do these try-except blocks. Something like this might catch your cases:

>>> import re
>>> re.match('.*(S0001|_0_)([0-9]+)\..*$', 'something_0_1234.tiff').groups()
('_0_', '1234')
>>> re.match('.*(S0001|_0_)([0-9]+)\..*$', 'somethingS00011234.tif').groups()
('S0001', '1234')
>>> re.match('.*(S0001|_0_)([0-9]+)\..*$', 'somethingS00011234.tiff').groups()
('S0001', '1234')

For your question about the serial try-except blocks, Niklas B.'s answer is obviously a great one.

Edit: What you are doing is called pattern matching, so why not use a pattern matching library? If the regex string is bothering you, there are cleaner ways to do it:

import re
matchers = []
sep = ['S0001', '_0_']
matchers.append(re.compile('^.*(' + '|'.join(sep) + ')(\d+)\..*$'))
matchers.append(some_other_regex_for_other_cases)

for matcher in matchers:
    match = matcher.match(yourstring)
    if match:
        print match.groups()[-1]

Another, more generic way which is compatible with custom functions:

import re
matchers = []
simple_sep = ['S0001', '_0_']
simple_re = re.compile('^.*(' + '|'.join(sep) + ')(\d+)\..*$')
def simple_matcher(s):
    m = simple_re.match(s)
    if m:
        return m.groups()[-1]

def other_matcher(s):
    if s[3:].isdigit():
        return s[3:]

matchers.append(simple_matcher)
matchers.append(other_matcher)

for matcher in matchers:
    match = matcher('yourstring')
    if match:
        print int(match)
mensi
  • 9,580
  • 2
  • 34
  • 43
  • +1, I didn't know that you can do it like that also... unfortunately regular expressions are something that I or the people that might edit the code are unfamiliar with. Can you put something else than REs into matchers list? – Juha Apr 25 '12 at 15:25
  • @Juha I added an approach which integrates both – mensi Apr 25 '12 at 15:40
  • Yep, if the problem is only about pattern matching, this is a more specific approach and certainly cleaner than catching exceptions. – Niklas B. Apr 25 '12 at 17:53