Python islice is reading the same lines

Question

I have a big log-file (> 1GB) which should be analysed, so I wrote a python-program. I have used islice so I could read the file in chunks (10,000 lines) so my server won't run out of memory.

I've looked up some islice solutions on stackoverflow and implemented one, but the program doesn't work as expected because isclice is reading the same lines every time (but stops correctly after reading the whole file...). I can't use with open because it comes with python 2.5, I have python 2.4...

My code looks like:

    n = 100000;     # n lines
    inf = open(fn, "r")
    while True:
        next_n_lines = list(islice(inf, n))
        if not next_n_lines:
            break
        out_fn = produce_clean_logfile(next_n_lines)
        a, t = main(out_fn)
        send_log(a,t)

Do you know what's wrong?

Thanks in advance. Regards, John.

I tried with the islice from itertools and it works. So your islice implementaion is wrong, you should post it if you want help. — lc2817, Apr 05 '13 at 07:55
On top of my script I wrote `from itertools import islice` ... or what do you mean? My `islice` code is in my question-text... — John Brunner, Apr 05 '13 at 07:58
you are right, there has to be another problem. i've tested it with a dumb 20-line-file and it works, so i have to search at another place! thanks for your answer! — John Brunner, Apr 05 '13 at 08:12

lc2817 · Answer 1 · 2013-04-05T08:02:03.090

3

from itertools import islice
n = 2;     # n lines
fn = "myfile"
inf = open(fn, "r")
while True:
    next_n_lines = list(islice(inf, n))
    if not next_n_lines:
        break
    print next_n_lines

works for me on python 2.5, 2.6, 2.7 => I can see the lines displayed in order.

The error certainly comes from your other functions, could you update your question?

edited Apr 05 '13 at 08:02

answered Apr 05 '13 at 07:51

lc2817

3,722
16
40

you are right, there has to be another problem. i've tested it with a dumb 20-line-file and it works, so i have to search at another place! thanks for your answer! – John Brunner Apr 05 '13 at 08:13

score 2 · Answer 2 · edited Apr 05 '13 at 08:20

2

You can use groupby for this

from itertools import groupby, count
with open(filename, 'r') as datafile:
    groups = groupby(datafile, key=lambda k, line=count(): next(line)//10000)
    for k, group in groups:
        for line in group:
            ...

edited Apr 05 '13 at 08:20

Raymond Hettinger

216,523
63
388
485

answered Apr 05 '13 at 08:14

John La Rooy

295,403
53
369
502

Python islice is reading the same lines

2 Answers2