pattern to dictionary of lists Python

Question

I have a file like this

module1 instance1(.wire1 (connectionwire1), .wire2 (connectionwire2),.... ,wire100 (connectionwire100)) ; module 2 instance 2(.wire1 (newconnectionwire1), .wire2 (newconnectionwire2),.... ,wire99 (newconnectionwire99))

Ther wires are repeated along modules. There can be many modules. I want to build a dictionary like this (not every wire in 2nd module is a duplicate).

[wire1:[(module1, instance1, connection1), (module2, instance2,newconnection1), wire2:[(module1 instance1 connection2),(module2, instance2,newconnection1)]... wire99:module2, instance2, connection99), ]

I am splitting the string on ; then splitting on , and then ( to get wire and connectionwire strings . I am not sure how to fill the data structure though so the wire is the key and module, instancename and connection are elements.

Goal- get this datastructure- [ wire: (module, instance, connectionwire) ] 

filedata=file.read()
realindex=list(find_pos(filedata,';'))
tempindex=0
for l in realindex:
    module=filedata[tempindex:l]
    modulename=module.split()[0]
    openbracketindex=module.find("(")
    closebracketindex=module.strip("\n").find(");")
    instancename=module[:openbracketindex].split()[1]
    tempindex=l
    tempwires=module[openbracketindex:l+1]
    #got to split wires on commas
    for tempw in tempwires.split(","):
        wires=tempw
        listofwires.append(wires)

Perhaps [regular expressions](https://docs.python.org/2/library/re.html) could work a bit more cleanly for you? — skrrgwasme, Feb 26 '15 at 03:39
Not very good at regex / plus there is the matter of wires being arbitarirly long, and multiple unknown # of brackets ("(" ")") — Illusionist, Feb 26 '15 at 03:44
I think you might try pyparsing for this, https://pyparsing.wikispaces.com/. It is kind of like regex, but more readable. — Cui Heng, Feb 26 '15 at 04:00
In your example string, module 2 and instance 2 have spaces in them - is that intentional? Can there be more than one *instance* in a *module*? — wwii, Feb 26 '15 at 06:07

wwii · Accepted Answer · 2015-02-26T06:57:40.120

2

Using the re module.

import re
from collections import defaultdict

s = "module1 instance1(.wire1 (connectionwire1), .wire2 (connectionwire2), .wire100 (connectionwire100)) ; module2 instance2(.wire1 (newconnectionwire1), .wire2 (newconnectionwire2), wire99 (newconnectionwire99))'

d = defaultdict(list)
module_pattern = r'(\w+)\s(\w+)\(([^;]+)'
mod_rex = re.compile(module_pattern)
wire_pattern = r'\.(\w+)\s\(([^\)]+)'
wire_rex = re.compile(wire_pattern)

for match in mod_rex.finditer(s):
    #print '\n'.join(match.groups())
    module, instance, wires = match.groups()
    for match in wire_rex.finditer(wires):
        wire, connection = match.groups()
        #print '\t', wire, connection
        d[wire].append((module, instance, connection))

for k, v in d.items():
    print k, ':', v

Produces

wire1 : [('module1', 'instance1', 'connectionwire1'), ('module2', 'instance2', 'newconnectionwire1')]
wire2 : [('module1', 'instance1', 'connectionwire2'), ('module2', 'instance2', 'newconnectionwire2')]
wire100 : [('module1', 'instance1', 'connectionwire100')]

edited Feb 26 '15 at 06:57

answered Feb 26 '15 at 06:47

wwii

23,232
7
37
77

wire99 does not have a *dot* in front of it (in the example string) so it doesn't match. I deleted the extra space for module2 and instance2 but I missed that dot. – wwii Feb 26 '15 at 07:15
If i wanted to print any left wires that don't have a match in the same format, how would I do that @wwii ? – Illusionist Feb 26 '15 at 22:08
Well if it is possible for a wire to not be proceeded by a dot, change the pattern so the dot is not required - ```wire_pattern = r'(\w+)\s\(([^\)]+)'``` – wwii Feb 26 '15 at 23:18
Thanks, i have some unique wires which are in only one module, and i want to get them in the same format -> wire101:[(module1, instance1, connection101) – Illusionist Feb 26 '15 at 23:22

score 1 · Answer 2 · answered Feb 26 '15 at 07:16

Answer provided by wwii using re is correct. I'm sharing an example of how you can solve your problem using pyparsing module which makes parsing human readable and easy to do.

from pyparsing import Word, alphanums, Optional, ZeroOrMore, Literal, Group, OneOrMore
from collections import defaultdict
s = 'module1 instance1(.wire1 (connectionwire1), .wire2 (connectionwire2), .wire100 (connectionwire100)) ; module2 instance2(.wire1 (newconnectionwire1), .wire2 (newconnectionwire  2), .wire99 (newconnectionwire99))'
connection = Word(alphanums)
wire = Word(alphanums)
module = Word(alphanums)
instance = Word(alphanums)
dot = Literal(".").suppress()
comma = Literal(",").suppress()
lparen = Literal("(").suppress()
rparen = Literal(")").suppress()
semicolon = Literal(";").suppress()
wire_connection = Group(dot + wire("wire") + lparen + connection("connection") + rparen + Optional(comma))
wire_connections = Group(OneOrMore(wire_connection))
module_instance = Group(module("module") + instance("instance") + lparen + ZeroOrMore(wire_connections("wire_connections")) + rparen + Optional(semicolon))
module_instances = OneOrMore(module_instance)
results = module_instances.parseString(s)
# create a dict
d = defaultdict(list)
for r in results:
    m = r['module']
    i = r['instance']
    for wc in r['wire_connections']:
        w = wc['wire']
        c = wc['connection']
        d[w].append((m, i, c)) 
print d

Output:

defaultdict(<type 'list'>, {'wire1': [('module1', 'instance1', 'connectionwire1'), ('module2', 'instance2', 'newconnectionwire1')], 'wire2': [('module1', 'instance1', 'connectionwire2'), ('module2', 'instance2', 'newconnectionwire2')], 'wire100': [('module1', 'instance1', 'connectionwire100')], 'wire99': [('module2', 'instance2', 'newconnectionwire99')]})

pattern to dictionary of lists Python

2 Answers2

Linked