0

I am attempting to parse a file with the following format

1999
I
Willem Jan van Steen         9859  77
Guillaume Kielmann           5264  77
Guillaume Bos                8200   6

(the file is much longer, and is seperated by academic year (as 1999) and different studies(as 'I'). The only thing i have to work with is the last number (like 77, 77, 6) This number is a percentage. In final goal is to make a BarChart consisting of 10 bars, the bar charts consist of the amound(sum) of times a percentage from the file falls into the range of the Bar Chart (say a bar chart from 70 to 80 % --> then if the above input is the whole file the sum would be 2, and the barchart will be of height 2. But my first problem is that i dont know how to parse the input.. I was thinking that python should read the lines and then from the index (so making a range) on which the percentage number starts to 'do somethinh' with the numbers (--> look in which range of bar chart they fall and then make a loop for the sum of how many times a percentage falls in that Bar Chart..)

Hope someone can help me!

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Welcome to Stack Overflow! It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. Check the [FAQ] and [ask] – Inbar Rose Nov 21 '13 at 10:36

1 Answers1

0

Use str.rsplit() to split a string into words, counting from the right. If you pass in None it'll split on arbitrary-width whitespace, giving you neat stripped strings, and a count, letting you keep whitespace in the first column.

Short demo of what that means:

>>> 'Willem Jan van Steen         9859  77\n'.rsplit(None, 2)
['Willem Jan van Steen', '9859', '77']

Here the spaces in the name are preserved, but the two numbers at the end are now separate elements in a list. The newline at the end is gone.

If you loop over an open file object, you get separate lines, giving you a method to parse a file line by line:

with open(inputfilename) as inputfh:
    for line in inputfh:
        columns = line.rsplit(None, 2)
        if len(columns) < 3:
            continue  # not a line with name and numbers
        percentage = int(columns[2])
        if 70 <= percentage <= 80:
            # we have a line that falls within your criteria
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343