1

I writing one python script to trim log file between two line.

here is the I have written:

import optparse
import datetime

parser = optparse.OptionParser()
parser.add_option("-f","--file",dest="log_file",
                          action="store",help="Specify log file to be parsed")
options, args = parser.parse_args()
vLogFile=options.log_file

start_time = raw_input("Please enter start time:\n[Format: HH:MM]=")
end_time = raw_input("Please enter end time:\n[Format: HH:MM]=")
trim_time = datetime.datetime.now().strftime('%d%H%M%S')
output_file = 'trimmed_log_%s.txt' %trim_time
with open(vLogFile) as file:
    for vline in file:
        vDate = vline[0:10]
        break
    start_line = vDate + ' ' + start_time
    end_line = vDate + ' ' +end_time
    print("Start time:%s" %start_line)
    print("End time:%s" %end_line)
    for num, line in enumerate(file, 1):
        if line.startswith(start_line):
            start_line_number = num
            break
    for num, line in enumerate(file, 1):
        if line.startswith(end_line):
            end_line_number = num
            break
    file.close()
print(start_line_number,end_line_number)
with open(vLogFile,"r") as file:
    oFile = open(output_file,'a')
    for num, line in enumerate(file, 1):
        if num >= start_line_number and num <= end_line_number:
            oFile.write(line)
print("%s Created" %output_file)

Below is the result of a script:

$ python trim.py -f ErrorLog.txt
Please enter start time:
[Format: HH:MM]=16:16
Please enter end time:
[Format: HH:MM]=16:29
Start time:2017-11-12 16:16
End time:2017-11-12 16:29
(333, 2084)
trimmed_log_23063222.txt Created

Here start line(333) is correct but end line(2084) is incorrect.

Here is my log file:

Could someone please help me in this?

Thanks, Yogesh

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
pgyogesh
  • 342
  • 2
  • 13

2 Answers2

2

This is a good use for itertools.dropwhile() and itertools.takewhile():

import itertools
from datetime import datetime

start_time = datetime.strptime("16:16", "%H:%M")
end_time = datetime.strptime("16:29", "%H:%M")

with open('ErrorLog.txt') as f_log, open('trimmed.txt', 'w') as f_trimmed:
    for row in itertools.dropwhile(lambda x: datetime.strptime(x[11:16], "%H:%M") < start_time, f_log):
        f_trimmed.write(row)
        break

    for row in itertools.takewhile(lambda x: datetime.strptime(x[11:16], "%H:%M") < end_time, f_log):
        f_trimmed.write(row)

This would give you an output trimmed.txt as follows:

2017-11-12 16:16:16.642 Info: Forest Extensions state changed from open to start closing because shutting down
2017-11-12 16:16:16.642 Info: Database Extensions is offline
2017-11-12 16:16:16.643 Info: Forest Extensions state changed from start closing to middle closing because shutting down
.
.
2017-11-12 16:24:07.161 Info: Deleted 1 MB at 345 MB/sec /Users/yogeshjadhav96/Library/Application Support/MarkLogic/Data/Forests/App-Services/000001db
2017-11-12 16:24:07.165 Info: Deleted 10 MB at 2361 MB/sec /Users/yogeshjadhav96/Library/Application Support/MarkLogic/Data/Forests/App-Services/000001dc

This has the effect of filtering out lines that don't match the starting requirement i.e. too early, and to then only read lines until the ending requirement. Each row is read in and a lambda function is used to extract the time, convert it into a datetime object and compare it with start_time or end_time accordingly.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
1

The problem is you're enumerating over the open file without rewinding it, so line numbers aren't correct anymore. You could use input_file.seek(0) to do that, but there're simpler ways. Something like this might work (dry-coded, YMMV) for the main loop - and in addition, it only reads through the file once.

with open(vLogFile) as input_file, open(output_file, 'a') as output_file:
    do_write = False
    for i, line in enumerate(file, 1):
        if i == 1:  # First line, so figure out the start/end markers
            vDate = vline[0:10]
            start_line = vDate + ' ' + start_time
            end_line = vDate + ' ' +end_time
        if not do_write and line.startswith(start_line):  # If we need to start copying...
            do_write = True
            print('Starting to write from line %d', i)
        if do_write:
            output_file.write(line)
        if line.startswith(end_line):  # Stop writing, we have everything
            print('Stopping write on line %d', i)
            break
AKX
  • 152,115
  • 15
  • 115
  • 172