-3

I need to process a file like this

keyword,synonym,bidirectional
5487500j,54875,false
76x76,76 x 76,true
feuille,"papier,ramette",false
7843000j,78430,false

and I need to transform it to a dict :

{'5487500j':'54875', '76x76':'76 x 76','feuille':['papier','ramette'], '7843000j':'78430'}

I don't succeed in any fast and elegant way to deal

martineau
  • 119,623
  • 25
  • 170
  • 301
Chauvinus
  • 35
  • 3

2 Answers2

0

It is a very simple parsing exercise, you really should try figure it out yourself.

Here is my solution, using find() and rfind() to find the index of the first and last commas to split the lines in blocks. The first and the mid blocks will be used as a key:value pair in a dict. The mid block may require some extra parsing and adjustment, see the code below.

def parse(line):
    first = line[:line.find(',')]
    last = line[line.rfind(','):]
    mid = line.replace(first, '').replace(last, '').strip(',')
    #print(first, mid)
    if ',' in mid:
        mid = mid.strip('"')
        mid = mid.split(',')
    return {first: mid}


txt = \
'''
keyword,synonym,bidirectional
5487500j,54875,false
76x76,76 x 76,true
feuille,"papier,ramette",false
7843000j,78430,false
'''

r = {}
for line in txt.split('\n'):
    if line:
        if line.startswith('keyword'):
            continue
        r.update(parse(line))
        
print(r)

{'5487500j': '54875', '76x76': '76 x 76', 'feuille': ['papier', 'ramette'], '7843000j': '78430'}
alec_djinn
  • 10,104
  • 8
  • 46
  • 71
0

Let me first specify what I have understood from your requirement.

  • You input is a csv file, with optionaly quoted fields: ok the csv module can parse it
  • The first field of each record will be used as a key in a dictionary
  • the third field is ignored
  • the second field will be the value in the dictionary. If it does not contain a comma, it will be used as is, else the value will be a splitted list

You should always write down in plain english (or whatever you first language is) a detailed specification of what you want to do before trying to code anything. Because the coding part should only be the translation of the spec.

Here the Python code (for my spec...) could be:

with open(inputfile) as fd:
    rd = csv.reader(fd)  # you files uses the default for quoting and delimiter
    _ = next(rd)         # skip header line
    result = {}
    for row in rd:
        result[row[0]] = row[1].split(',') if ',' in row[1] else row[1]

In fact a comprehension would be more Pythonic than the loop:

    result = {row[0]: row[1].split(',') if ',' in row[1] else row[1]
              for row in rd}
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252