Read text line by line in python

Question

I would like to make a script that read a text line by line and based on lines if it finds a certain parameter populates an array. The idea is this

Read line
if Condition 1
  #True
  nested if Condition 2
...
else Condition 1 is not true
  read next line

I can't get it to work though. I'm using readline () to read the text line by line, but the main problem is that the command never works to make it read the next line. Can you help me? Below an extract of my actual code:

col = 13     # colonne
rig = 300    # righe

a = [ [ None for x in range(col) ] for y in range(rig) ] 

counter = 1
file = open('temp.txt', 'r')
files = file.readline()
for line in files:
 if 'bandEUTRA: 32' in line:
  if 'ca-BandwidthClassDL-EUTRA: a' in line:
   a[counter][5] = 'DLa'
   counter = counter + 1
  else:
   next(files)
 else:
  next(files) 

print('\n'.join(map(str, a)))

Could you give a short extract of your text file and an example of you python code? — scr, Apr 13 '22 at 17:00
Please provide a [MCVE] of your non-working code; we're here to *help*, not do the work for you on a vague spec. — ShadowRanger, Apr 13 '22 at 17:07
@ShadowRanger I added an extract of the code. Sorry, but it wasn't my intention to get me to code, just get help. — Tork98, Apr 13 '22 at 17:17
@Tork98: Thanks! Sincerely! You have no idea how many people ask "write my code for me" questions and never add a [MCVE]. I've posted an answer that corrects your code to do roughly what you say you want (with dramatically fewer and simpler lines of code, no leaked file handles, and no assumption the file is small enough to be slurped into memory all at once). — ShadowRanger, Apr 13 '22 at 17:35

ShadowRanger · Accepted Answer · 2022-04-13T17:38:02.360

1

Fixes for the code you asked about inline, and some other associated cleanup, with comments:

col = 13     # colonne
rig = 300    # righe

a = [[None] * col for y in range(rig)]  # Innermost repeated list of immutable
                                        # can use multiplication, just don't do it for
                                        # outer list(s), see: https://stackoverflow.com/q/240178/364696

counter = 1
with open('temp.txt') as file:  # Use with statement to get guaranteed file closure; 'r' is implicit mode and can be omitted
    # Removed: files = file.readline()  # This makes no sense; files would be a single line from the file, but your original code treats it as the lines of the file
    # Replaced: for line in files:  # Since files was a single str, this iterated characters of the file

    for line in file:  # File objects are iterators of their own lines, so you can get the lines one by one this way
        if 'bandEUTRA: 32' in line and 'ca-BandwidthClassDL-EUTRA: a' in line:  # Perform both tests in single if to minimize arrow pattern
            a[counter][5] = 'DLa'
            counter += 1  # May as well not say "counter" twice and use +=

    # All next() code removed; next() advances an iterator and returns the next value,
    # but files was not an iterator, so it was nonsensical, and the new code uses a for loop that advances it for you, so it was unnecessary.

    # If the goal is to intentionally skip the next line under some conditions, you *could*
    # use next(files, None) to advance the iterator so the for loop will skip it, but
    # it's rare that a line *failing* a test means you don't want to look at the next line
    # so you probably don't want it

# This works:
print('\n'.join(map(str, a)))
# But it's even simpler to spell it as:
print(*a, sep="\n")
# which lets print do the work of stringifying and inserting the separator, avoiding 
# the need to make a potentially huge string in memory; it *might* still do so (no documented
# guarantees), but if you want to avoid that possibility, you could do:
sys.stdout.writelines(map('{}\n'.format, a))
# which technically doesn't guarantee it, but definitely actually operates lazily, or
for x in a:
    print(x)
# which is 100% guaranteed not to make any huge strings

edited Apr 13 '22 at 17:38

answered Apr 13 '22 at 17:31

ShadowRanger

143,180
12
188
271

Thank you very much for the corrections and advice which have been very useful to me since, as probably evident, I am just starting out with python. I have a question, in some cases the two conditions are not found on the same line of the file, but on two successive lines. For this I had thought of a first "if" for the first condition, which if true would have commanded a second if with the second condition on the next line. Is it possible to do this or am I getting the wrong approach? – Tork98 Apr 13 '22 at 17:46
@Tork98: So, the main question for that is, given the first line matches a condition #1, can you: 1) Not bother testing the next line for condition #1 (or ignore it if it matches) or must you 2) Treat every possible line as a possible match for condition #1 (just because line `n` meets condition #1 doesn't mean we can skip the check for condition #1 on line `n + 1`), and *also* do additional checks for condition #2 on lines that follow a line matching condition #1? – ShadowRanger Apr 13 '22 at 17:59
In scenario #1, the simplest solution is to do `secondline = next(file, '')` when condition #1 matches, then test `secondline` against condition #2. Because file objects are stateful iterators, `next(file, '')` will pull a line and cause it not to be seen by the top level `for` loop (if you're on the last line of the file, that second argument prevents a `StopIteration` exception by pretending the next line is empty), so the value in `secondline` will never be tested against condition #1. – ShadowRanger Apr 13 '22 at 18:01
In scenario #2, you'd just make the loop `for line, nextline in itertools.pairwise(file):`, which would yield all lines in the file save the last as `line`, pairing each with the line that follows it. Each pair would be handled independent of the rest (so you only need to test condition #1 against `line`, and leave the next iteration of the top-level loop to process `nextline` against condition #1, since loop `n`'s `nextline` will be loop `n +1`'s `line`). – ShadowRanger Apr 13 '22 at 18:06
Note: `itertools.pairwise` is *very* new, but you can borrow [the recipe from the docs](https://docs.python.org/3/library/itertools.html#itertools.pairwise) if you're on pre-3.10 Python to make your own `pairwise` from `itertools.tee`, `next` and `zip` (which I recommend you wrap in a `pairwise` function, since, while efficient, the recipe looks confusing, and giving it a useful name and factoring it out of the code which uses it makes it much more readable). – ShadowRanger Apr 13 '22 at 18:07
Actually i'm intererested in the scenario 1 and with the next command everything works correctly. Thank you so much! – Tork98 Apr 13 '22 at 20:40

score 0 · Answer 2 · edited Apr 13 '22 at 17:33

0

You can do:

with open("filename.txt", "r") as f:
    for line in f:
        clean_line = line.rstrip('\r\n')
        process_line(clean_line)

Edit: for your application of populating an array, you could do something like this:

with open("filename.txt", "r") as f:
    contains = ["text" in l for l in f]

This will give you a list of length number of lines in filename.txt, the contents of the array will be False for each line that doesn't contain text, and True for each line that does.

Edit 2: To reflect @ShadowRanger's comments, I've changed my code to not do iterate over each line in the file without reading the whole thing at once.

edited Apr 13 '22 at 17:33

ShadowRanger

143,180
12
188
271

answered Apr 13 '22 at 17:05

BenMcLean981

700
5
16

1

To be clear, this is a *terrible* solution if your input file size is unbounded. And there's no real reason to do it even *if* the file size is small. You could just delete the first two lines from the `with`, and replace the `for` with `for line in f:` (because files are iterables of their lines, and they're *lazy* so you don't have to load the whole file into memory just to process the a line at a time). It keeps trailing newlines, but using `for line in map(str.rstrip, f):` trims trailing whitespace, or you could change the call to `process_line(line.rstrip('\r\n'))` to limit to newlines. – ShadowRanger Apr 13 '22 at 17:10
That new code you added isn't even syntactically legal, and uses a non-existent method (`contains` is not a method of `str`). To do what you describe, you'd have to do `contains = [1 if "text" in l else 0 for l in lines]`, and in practice, converting to `int` is pointless (why not just store `True` and `False`), so `contains = ["text" in l for l in lines]` would be shorter/simpler/faster. – ShadowRanger Apr 13 '22 at 17:12
@ShadowRanger, this is good to know. I'll amend my answer to reflect this. – BenMcLean981 Apr 13 '22 at 17:12

Read text line by line in python

2 Answers2