Build a dictionary from successful regex matches in python

Question

I'm pretty new to Python, and I'm trying to parse a file. Only certain lines in the file contain data of interest, and I want to end up with a dictionary of the stuff parsed from valid matching lines in the file.

The code below works, but it's a bit ugly and I'm trying to learn how it should be done, perhaps with a comprehension, or else with a multiline regex. I'm using Python 3.2.

file_data = open('x:\\path\\to\\file','r').readlines()
my_list = []
for line in file_data:
    # discard lines which don't match at all
    if re.search(pattern, line):
        # icky, repeating search!!
        one_tuple = re.search(pattern, line).group(3,2)
        my_list.append(one_tuple)
my_dict = dict(my_list)

Can you suggest a better implementation?

Comprehensions can be pretty, but you can't easily bind a variable to a value inside them, so you'd need the double `re.search`. Just use a loop. — Fred Foo, Jun 19 '12 at 06:28

score 6 · Accepted Answer · answered Jun 19 '12 at 08:25

6

Thanks for the replies. After putting them together I got

file_data = open('x:\\path\\to\\file','r').read()
my_list = re.findall(pattern, file_data, re.MULTILINE)
my_dict = {c:b for a,b,c in my_list}

but I don't think I could have gotten there today without the help.

answered Jun 19 '12 at 08:25

WiringHarness

362
1
2
10

3

You might want to make the first group in you regex non-capturing (`?:`) to skip the comprehension step: `my_dict = dict(re.findall...)` – georg Jun 19 '12 at 08:46
1

Pretty good improvement. However: reading all the data into a variable, rather than iterating over a file object (and implicitly calling `readline()` method), is not very scalable. `re.findall()` works perfectly well on an iterator rather than a variable. – smci Nov 20 '17 at 05:37

score 5 · Answer 2 · answered Jun 19 '12 at 06:26

Here's some quick'n'dirty optimisations to your code:

my_dict = dict()

with open(r'x:\path\to\file', 'r') as data:
    for line in data:
        match = re.search(pattern, line)
        if match:
            one_tuple = match.group(3, 2)
            my_dict[one_tuple[0]] = one_tuple[1]

score 3 · Answer 3 · answered Jun 19 '12 at 07:48

In the spirit of EAFP I'd suggest

with open(r'x:\path\to\file', 'r') as data:
    for line in data:
        try:
            m = re.search(pattern, line)
            my_dict[m.group(2)] = m.group(3)
        except AttributeError:
            pass

Another way is to keep using lists, but redesign the pattern so that it contains only two groups (key, value). Then you could simply do:

  matches = [re.findall(pattern, line) for line in data]
  mydict = dict(x[0] for x in matches if x)

score 3 · Answer 4 · answered Jul 16 '15 at 10:24

3

matchRes = pattern.match(line)
if matchRes:
    my_dict = matchRes.groupdict()

answered Jul 16 '15 at 10:24

Ronn Macc

1,271
9
7

Please, follow your code snippet with some details, explanations to be more clear for readers. – VP. Jul 16 '15 at 12:14

score 1 · Answer 5 · answered Jun 19 '12 at 07:05

1

I'm not sure I'd recommend it, but here's a way you could try to use a comprehension instead(I substituted a string for the file for simplicity)

>>> import re
>>> data = """1foo bar
... 2bing baz
... 3spam eggs
... nomatch
... """
>>> pattern = r"(.)(\w+)\s(\w+)"
>>> {x[0]: x[1] for x in (m.group(3, 2) for m in (re.search(pattern, line) for line in data.splitlines()) if m)}
{'baz': 'bing', 'eggs': 'spam', 'bar': 'foo'}

answered Jun 19 '12 at 07:05

Nolen Royalty

18,415
4
40
50

Dict comprehension; I like it! – WiringHarness Jun 19 '12 at 08:19

Build a dictionary from successful regex matches in python

5 Answers5