2

I have a big log-file (> 1GB) which should be analysed, so I wrote a python-program. I have used islice so I could read the file in chunks (10,000 lines) so my server won't run out of memory.

I've looked up some islice solutions on stackoverflow and implemented one, but the program doesn't work as expected because isclice is reading the same lines every time (but stops correctly after reading the whole file...). I can't use with open because it comes with python 2.5, I have python 2.4...

My code looks like:

    n = 100000;     # n lines
    inf = open(fn, "r")
    while True:
        next_n_lines = list(islice(inf, n))
        if not next_n_lines:
            break
        out_fn = produce_clean_logfile(next_n_lines)
        a, t = main(out_fn)
        send_log(a,t)

Do you know what's wrong?

Thanks in advance. Regards, John.

LtWorf
  • 7,286
  • 6
  • 31
  • 45
John Brunner
  • 2,842
  • 11
  • 44
  • 85
  • 1
    I tried with the islice from itertools and it works. So your islice implementaion is wrong, you should post it if you want help. – lc2817 Apr 05 '13 at 07:55
  • On top of my script I wrote `from itertools import islice` ... or what do you mean? My `islice` code is in my question-text... – John Brunner Apr 05 '13 at 07:58
  • 1
    you are right, there has to be another problem. i've tested it with a dumb 20-line-file and it works, so i have to search at another place! thanks for your answer! – John Brunner Apr 05 '13 at 08:12

2 Answers2

3
from itertools import islice
n = 2;     # n lines
fn = "myfile"
inf = open(fn, "r")
while True:
    next_n_lines = list(islice(inf, n))
    if not next_n_lines:
        break
    print next_n_lines

works for me on python 2.5, 2.6, 2.7 => I can see the lines displayed in order.

The error certainly comes from your other functions, could you update your question?

lc2817
  • 3,722
  • 16
  • 40
  • you are right, there has to be another problem. i've tested it with a dumb 20-line-file and it works, so i have to search at another place! thanks for your answer! – John Brunner Apr 05 '13 at 08:13
2

You can use groupby for this

from itertools import groupby, count
with open(filename, 'r') as datafile:
    groups = groupby(datafile, key=lambda k, line=count(): next(line)//10000)
    for k, group in groups:
        for line in group:
            ... 
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
John La Rooy
  • 295,403
  • 53
  • 369
  • 502