Nice file parsing required

Question

I need to process a file like this

keyword,synonym,bidirectional
5487500j,54875,false
76x76,76 x 76,true
feuille,"papier,ramette",false
7843000j,78430,false

and I need to transform it to a dict :

{'5487500j':'54875', '76x76':'76 x 76','feuille':['papier','ramette'], '7843000j':'78430'}

I don't succeed in any fast and elegant way to deal

and some explanation to the underlying logic - e.g. why `""papier,ramette"` is split into list, does third column affect parsing, etc. — buran, Jan 12 '21 at 10:01
Please read about `string.find()` and `string.rfind()` in the Python documentation. — alec_djinn, Jan 12 '21 at 10:29

score 0 · Answer 1 · answered Jan 12 '21 at 10:46

It is a very simple parsing exercise, you really should try figure it out yourself.

Here is my solution, using find() and rfind() to find the index of the first and last commas to split the lines in blocks. The first and the mid blocks will be used as a key:value pair in a dict. The mid block may require some extra parsing and adjustment, see the code below.

def parse(line):
    first = line[:line.find(',')]
    last = line[line.rfind(','):]
    mid = line.replace(first, '').replace(last, '').strip(',')
    #print(first, mid)
    if ',' in mid:
        mid = mid.strip('"')
        mid = mid.split(',')
    return {first: mid}


txt = \
'''
keyword,synonym,bidirectional
5487500j,54875,false
76x76,76 x 76,true
feuille,"papier,ramette",false
7843000j,78430,false
'''

r = {}
for line in txt.split('\n'):
    if line:
        if line.startswith('keyword'):
            continue
        r.update(parse(line))
        
print(r)

{'5487500j': '54875', '76x76': '76 x 76', 'feuille': ['papier', 'ramette'], '7843000j': '78430'}

score 0 · Accepted Answer · answered Jan 12 '21 at 10:48

Let me first specify what I have understood from your requirement.

You input is a csv file, with optionaly quoted fields: ok the csv module can parse it
The first field of each record will be used as a key in a dictionary
the third field is ignored
the second field will be the value in the dictionary. If it does not contain a comma, it will be used as is, else the value will be a splitted list

You should always write down in plain english (or whatever you first language is) a detailed specification of what you want to do before trying to code anything. Because the coding part should only be the translation of the spec.

Here the Python code (for my spec...) could be:

with open(inputfile) as fd:
    rd = csv.reader(fd)  # you files uses the default for quoting and delimiter
    _ = next(rd)         # skip header line
    result = {}
    for row in rd:
        result[row[0]] = row[1].split(',') if ',' in row[1] else row[1]

In fact a comprehension would be more Pythonic than the loop:

    result = {row[0]: row[1].split(',') if ',' in row[1] else row[1]
              for row in rd}

Nice file parsing required

2 Answers2