0

I have to create a dictionary by reading a file

The information is split into lines

The keys are between brackets but not all of them are keys. Just the ones after [date]

between two keys are the values split into lines, but not all the lines are selectable values

The final result should be something like

d=[key:[units,height,site]]

Some of the keys do not have all the values. Then if either units,height or site are not present, the value should be fulfilled with '' or 0

#info in the file
[System]
serial=130204
[Summary]
file_created=2014-11-20 03:02:09
user=j
....#more info
[date]#after this key starts the keys
...
[AX1]
units=m/s
serial_setting=38400
height=70.4
stats=avg
formula=yes
site=site1
[H4]
serial_setting=38100
height=20.6
stats=std
formula=yes
site=site2
[V3]
units=m
...

Final result in the example

param={AX1:['m/s',70.4,'site1'],H4:['',20.6,'site2'], V3:['m',0,'']}

I know how to create a dictionary from list of lists but not to set default values ('' for the strings values an 0 for the numeric ones) in case some values are missing

I tried with defaultdict from Collections but i am not yet so familiar with this class and probably i am not using all its possibilities

Thanks for any help

gis20
  • 1,024
  • 2
  • 15
  • 33

2 Answers2

2

This can be done using Python's ConfigParser as follows:

import ConfigParser
from itertools import dropwhile
import io

config = ConfigParser.ConfigParser({'unit' : '', 'units' : '', 'height' : 0, 'site' : ''})
skip = []

# Skip over lines until the first section is found
with open('input.txt', 'r') as f_input:
    for line in dropwhile(lambda x: not x.startswith('['), f_input):
        skip.append(line)

config.readfp(io.BytesIO('\n'.join(skip)))      

# Remove sections which are not required
for remove in ['Summary', 'System', 'date']:
    config.remove_section(remove)

param = {}
for section in config.sections():
    param[section] = [
        config.get(section, 'unit') + config.get(section, 'units'), 
        config.getfloat(section, 'height'),
        config.get(section, 'site')]

print param

Giving you the output:

{'AX1': ['m/s', 70.4, 'site1'], 'V3': ['m', 0.0, ''], 'H4': ['', 20.6, 'site2']}

Additionally, lines in the file are not parsed until the first section is found, i.e. a line starting with a [.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • This looks quite good but what about if there lines at the beginning of the file without header format []. It generates one error 'MissingSectionHeaderError: File contains no section headers.' how can i use config.read() to read from one specific line? – gis20 Oct 09 '15 at 21:52
  • I have updated the script to now skip over any non-standard header information. It should now work as needed. – Martin Evans Oct 11 '15 at 16:26
0

After you determine the point at which the key starts then this should give you the necessary ideas on how to parse the rest of the file:

defaults = {'units':'', 'height':0, 'site':''}

with open(<file>) as f:
    <skip first section to date>

    param = {}
    d = {}
    tag = ""
    for line in f:
        if line[0] == '[':
            if tag:
                param[tag] = [d.get(k, defaults[k]) for k in ['units', 'height', 'site']]
            tag = line[1:-2]
            d = {}
            continue
        k,v = line.rstrip().split('=')
        d[k] = v
    else:
        param[tag] = [d.get(k, defaults[k]) for k in ['units', 'height', 'site']]
param

Output (changing unit to units in 'AX1'):

{'AX1': ['m/s', '70.4', 'site1'],
 'H4': ['', '20.6', 'site2'],
 'V3': ['m', 0, '']}

Update: I really like the approach of @MartinEvans using configparser [py3] (ConfigParser [py2]), but believe it can be simpler:

from configparser import ConfigParser
#from ConfigParser import ConfigParser  [py2]

with open(<file>) as f:
    <skip first section to date>

    config = ConfigParser()
    config['DEFAULT'] = {'units':'', 'height':0, 'site':''}
    config.read_file(f)
    # config.readfp(f)  [py2]
    for section in config.sections():
        param[section] = [config.get(section, k) for k in ['units', 'height', 'site']]
param

Output:

{'AX1': ['m/s', '70.4', 'site1'],
 'H4': ['', '20.6', 'site2'],
 'V3': ['m', 0, '']}
AChampion
  • 29,683
  • 4
  • 59
  • 75