0

Say you are reading input from a file structured like so

P3
400 200
255
255 255 255
255 0 0
255 0 0
etc...

But you want to account for any mistakes that may come from the input file as in

P3 400
200
255
255 255
255
255 0 0
255 0
0
etc...

I want to read in the first token 'P3' then the next two '400' '200' (height/width) the '255' and from here on, I want to read every token in and account for how they should be in groups of 3. I have the correct code to read this information but I can't seem to get past the wall of figuring out how to read in information by token and not by line.

Which doesn't account for an imperfect input.

Blckknght
  • 100,903
  • 11
  • 120
  • 169
somebody
  • 85
  • 3
  • 10
  • have you seen [`netpbmfile.py`](http://www.lfd.uci.edu/~gohlke/code/netpbmfile.py.html)? – jfs Apr 08 '14 at 18:35

2 Answers2

0

Here is one way to go about it, using csv module:

import csv
first_four = []
all_of_the_tokens = []
first_four_processed = False

with open('token') as token_file:
    csv_reader = csv.reader(token_file, delimiter=' ')
    for row in csv_reader:
        all_of_the_tokens.extend(row)
        if not first_four_processed:
            first_four.extend(row)
        if len(first_four) >= 4 and not first_four_processed:
            first_four_processed = True
            first_four = first_four[:4]
token_file.close()

rest_of_the_tokens = all_of_the_tokens[4:]

for i in range(0, len(rest_of_the_tokens), 3):
    print rest_of_the_tokens[i:i+3]
shaktimaan
  • 11,962
  • 2
  • 29
  • 33
  • How do you recommend I read these tokens in in groups of 3 following the check for the first 4 numbers? – somebody Apr 08 '14 at 02:54
  • Should that be a new for loop under the 2 if statements and then append those? – somebody Apr 08 '14 at 03:12
  • No, the for loop goes after closing the file. – shaktimaan Apr 08 '14 at 03:13
  • I am missing an elementary concept to what you're saying in the comments below your code I'm sure. I have edited my code but I'm not sure how to integrate the end. I'd like to take the first 4 out and pass them as args later (I can do this) and then take the rest, once vetted that they have been read in in groups of 3, and pass that list as an arg as well. – somebody Apr 08 '14 at 03:35
  • Is there some way for me to send you a specific message? I'm not sure if stack has a messaging system. – somebody Apr 08 '14 at 15:04
0

If your file consists of groups of three values (after the first P3 item) and you cannot rely upon the line breaks to have them grouped properly, I suggest reading the file as a single string and doing the splitting and grouping yourself. Here's a straight-forward way:

with open(filename) as f:
    text = f.read()    # get the file contents as a single string

tokens = text.split()  # splits the big string on any whitespace, returning a list
it = iter(tokens)      # start an iterator over the list
prefix = next(it)      # grab the "P3" token off the front
triples = list(zip(it, it it))  # make a list of 3-tuples from the rest of the tokens

Using zip on multiple references to the same iterator is the key trick here. If you needed to handle other group sizes with the same code, you could use zip(*[it]*grouplen).

Note that this will discard any left-over values at the end of the file if they don't form a group of three. If you need to handle that situation differently, I suggest using zip_longest from the itertools module, rather than the regular zip function. (See the grouper recipe in the itertools documentation.)

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • This is good but it will not correctly read a PPM image. P3-formatted PPMs contain two additional header lines (height, width) you have not accounted for. (between where you set `prefix` and `triples`) – Two-Bit Alchemist Apr 09 '14 at 00:18
  • @Two-BitAlchemist: Ah, you're right. I had read the question too quickly, and thought that only the `P3` line was different from the others. You can quite easily grab the first two or three numeric values too, if you want, in the same way that I'm grabbing `prefix` in my code, just call `next(it)` for each one. – Blckknght Apr 09 '14 at 02:02