Extracting and plotting tabulated plaintext data using Python 3

Question

For the purposes of clarity I have created a gist containing an example of the format of data I am referring to:

https://gist.github.com/TestAcc7777/5823760

After some other irrelevant data, the output file contains many of these tabulated blocks, one straight after the other, with the header section being repeated each time. There are a maximum of eight sets of readings per instance of the header, which together comprise one block. The organisation of quantities in the header reflects the organisation of the values for the subsequent readings.

Given that, I need to extract values for some of the quantities given in the header, place them in a file with their associated quantity, and have matplotlib plot one set of quantities versus another. For example eng_tot versus time(ps).

I am completely lost as this falls well outside of my experience in using Python or matplotlib, so any help is welcome.

@falsetru I'm not sure what's wrong with the iink, as I've made sure it's public and it's working on my end. — verdant, Jun 20 '13 at 16:23
@jedwards I'm afraid I haven't as I completely unsure as to the logic I need to use to even begin extraction. — verdant, Jun 20 '13 at 16:25

score 0 · Answer 1 · answered Jun 20 '13 at 16:41

This isn't a complete answer, but it should get you started.

#!/bin/env python

import sys
import re
import pprint

# Some function that determines whether a line is a seperator
def is_sep(line):
    return (line.count('-') > 80)

# Some function that parses the "block"
def parse_block(lines):
    parsed_lines = []
    for line in lines:
        matches = re.findall('(\S+)', line)
        parsed_lines.append(matches)
    return parsed_lines

if __name__ == "__main__":
    # Read in data
    with open('data.txt', 'r') as fh:
        data = fh.read()

    # Split data into lines, then split the lines into "blocks"
    blocks = []
    block_lines = []
    for line in data.splitlines():
        if(is_sep(line)):
            blocks.append(block_lines)
            block_lines = []
        else:
            block_lines.append(line)

    # This splitting method will create an empty "block" as the first element of the list, delete it
    blocks = blocks[1:]

    # For all blocks but the header block, pass it to "parse_block"
    parsed_blocks = []
    for block in blocks[1:]:
        parsed_blocks.append(parse_block(block))

    pprint.pprint(parsed_blocks[0])

For example, the last block of your data will be parsed as:

[['1', '2.6814E+03', '3.3117E+02', '1.6616E+03', '-1.1814E+02', '1.8312E+03', '3.5247E+03', '2.5879E+02', '-3.8350E+03', '0.0000E+00'],
 ['0.0', '2.5785E+04', '6.8687E+01', '-6.7273E+04', '-7.6310E+03', '-1.8316E+03', '-5.7811E+04', '0.0000E+00', '0.0000E+00', '0.0000E+00'],
 ['4.9', '1.3300E+04', '0.0000E+00', '0.0000E+00', '0.0000E+00', '9.0000E+01', '9.0000E+01', '9.0000E+01', '0.0000E+00', '1.1911E+02'],
 [],
 ['rolling', '2.6814E+03', '3.3117E+02', '1.6616E+03', '-1.1814E+02', '1.8312E+03', '3.5247E+03', '2.5879E+02', '-3.8350E+03', '0.0000E+00'],
 ['averages', '2.5785E+04', '6.8687E+01', '-6.7273E+04', '-7.6310E+03', '-1.8316E+03', '-5.7811E+04', '0.0000E+00', '0.0000E+00', '0.0000E+00'],
 ['1.3300E+04', '0.0000E+00', '0.0000E+00', '0.0000E+00', '9.0000E+01', '9.0000E+01', '9.0000E+01', '0.0000E+00', '1.1911E+02']]

Thanks! I've got my head round that, and see it generates (for all data) a list of lists of lists. How can I traverse that to consistently pick out the elements I want? From the header values I can see which ones I need, but I don't know how to express that. I've not dealt with lists as advanced as these, and I'm guessing a 'for' loop won't cut it. — verdant, Jun 20 '13 at 20:38

Extracting and plotting tabulated plaintext data using Python 3

1 Answers1