Python - regex parsing file

Question

I have a file like this

module modulename(wire1, wire2, \wire3[0], \wire3[1], \wire3[2], wire4, wire5,wire6, wire7, \wire8[0], wire9); nonmodule modulename(wire1, wire2, \wire3[0], \wire3[1], \wire3[2],  wire4, wire5,wire6, wire7, \wire8[0], wire9)

i want to change this string to

    module modulename(wire1, wire2, wire3[0:2],wire4, wire5, wire6, wire7,wire8[0],wire9) ; nonmodule modulename(wire1, wire2, wire3[0], wire3[1], wire3[2], wire4, wire5,wire6, wire7, wire8[0], wire9)

so basically remove \ and delete individual copies of wires and change size to [start:stop] when the starting keyword is module and just removing slashes when starting keyword after ";" is not module

If i can parse it with regex i can do the rest, i am trying the code below but its not matching anything. the code is modified from -pattern to dictionary of lists Python

lines=f.read()

d = defaultdict(list)
module_pattern = r'(\w+)\s(\w+)\(([^;]+)'
mod_rex = re.compile(module_pattern)
wire_pattern = r'(\w+)\s[\\]?(\w+)['

wire_rex = re.compile(wire_pattern)

for match in mod_rex.finditer(lines):
    #print '\n'.join(match.groups())
    module, instance, wires = match.groups()
    for match in wire_rex.finditer(wires):
        wire, connection = match.groups()
        #print '\t', wire, connection
        d[wire].append((module, instance, connection))

for k, v in d.items():
    print k, ':', v

Help is appreciated , havent been able to identify the tokens.

possible duplicate of [split float numbers in half with python](http://stackoverflow.com/questions/28444525/split-float-numbers-in-half-with-python) — ivan_pozdeev, Feb 27 '15 at 22:41
The problem in the linked question is different but of the same sort. — ivan_pozdeev, Feb 27 '15 at 22:41
i can't really agree about it being same but thanks for your input ivan. — Illusionist, Feb 27 '15 at 23:10

alpha bravo · Answer 1 · 2015-03-02T03:49:42.720

1

~~it seems like you're removing \ regardless if where it is, so replace them after performing this pattern~~

(\bmodule\b[^()]+\([^;]*?)(\\wire(\d+)\[(\d+)\][^;]*\wire\3\[(\d+)\])

and replace w/ \1wire\3[\4:\5] Demo

per comment try new pattern

(\bmodule\b[^\\;]+)\\([^[]+)\[(\d+)\][^;]+\2\[(\d+)\]

Demo

edited Mar 02 '15 at 03:49

answered Mar 02 '15 at 02:26

alpha bravo

7,838
1
19
23

Thanks for the reply . Im only removing '\' if its in module. . Trying this now. – Illusionist Mar 02 '15 at 02:34
wire name isn't always wire, how will I replace with the regex if its a random word? I need to keep the origional wire name – Illusionist Mar 02 '15 at 02:38

score 1 · Accepted Answer · 2015-03-02T04:14:41.433

This should get you started. I'm not sure of what assumptions you can make about your file format, but it should be straightforward enough to modify this code to suit your needs.

Also, I assumed that the ordering of the ports was strict, so they have been left unmodified. This is also the reason I didn't use dicts.

This code will strip out all backslashes and collapse adjacent bits into vectors. This will also handle vectors that do not start at 0 (for example someport[3:8]). I also chose to make single bit vectors say [0:0] rather than [0].

import re
import sys

mod_re   = re.compile(r'module\s*([^(]+)\(([^)]*)\);(.*)')
wire_re  = re.compile(r'([^[]+)\[([0-9]+)\]')

def process(line):
    # Get rid of all backslashes. You can make this more selective if you want
    clean = line.replace('\\', '')

    m = mod_re.search(clean)
    if m:
        ports = []
        mod_name, wires, remaining = m.groups()

        for wire in wires.split(','):
            wire = wire.replace(' ', '')

            m = wire_re.search(wire)
            if m:
                # Found a vector
                n = int(m.group(2))
                prev_wire, _ = ports[-1]

                # If previous port was a vector, tack on next value
                if prev_wire == m.group(1):
                    ports[-1][1][1] = n
                else:
                    ports.append((m.group(1), [n, n]))
            else:
                # Found a scalar
                ports.append((wire, None))

        # Stringify ports
        out = []
        for port in ports:
            name, val = port
            if val is None:
                out.append(name)
            else:
                start, end = val
                out.append('%s[%s:%s]' % (name, start, end))

        print 'module %s(%s); %s' % (mod_name, ', '.join(out), remaining)


f = open(sys.argv[1], 'r')
if f:
    for l in f.readlines():
        process(l)
    f.close()

Output:

module modulename(wire1, wire2, wire3[0:2], wire4, wire5, wire6, wire7, wire8[0:0], wire9);  nonmodule modulename(wire1, wire2, wire3[0], wire3[1], wire3[2],  wire4, wire5,wire6, wire7, wire8[0], wire9)

PS: I don't know what exactly you are trying to do, but changing the module definition will also require changing the instantiation as well.

EDIT: Removed with keyword when opening file for Python2.5 support.

Thanks, keep getting an error for invalid syntax at-> if prev_wire == m.group(1) .. any ideas? — Illusionist, Mar 02 '15 at 03:47
What version of Python are you using? I am using 2.7.8 and it works fine for me. — , Mar 02 '15 at 03:52
Hmm, the `with` keyword probably won't work on that old version. Don't know why it's pointing to that `prev_wire` line though. — , Mar 02 '15 at 04:00
with does work, but i changed it to f=open(file,"r") and then left the loop there — Illusionist, Mar 02 '15 at 04:11
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/72015/discussion-between-balthamos-and-illusionist). — , Mar 02 '15 at 04:15

Python - regex parsing file

2 Answers2