0

I have a log file and am trying to print the data between two dates.

2020-01-31T20:12:38.1234Z, asdasdasdasdasdasd,...\n
2020-01-31T20:12:39.1234Z, abcdef,...\n
2020-01-31T20:12:40.1234Z, ghikjl,...\n
2020-01-31T20:12:41.1234Z, mnopqrstuv,...\n
2020-01-31T20:12:42.1234Z, wxyzdsasad,...\n

This is the sample log file and I want to print the lines between 2020-01-31T20:12:39 up to 2020-01-31T20:12:41.

So far I have manged to find and print the starting date line. I have passed the starting date as start.

with open("logfile.log") as myFile:
    for line in myFile:
        linenum += 1
        if line.find(start) != -1:
            print("Line " + str(linenum) + ": " + line.rstrip('\n'))

but how do I keep printing till the end date?

Liam
  • 317
  • 1
  • 11
NiK K
  • 11
  • 2
  • This exact same question already has answers here: [how-to-to-iterate-through-a-specific-time-range-in-a-logfile](https://stackoverflow.com/questions/44272989) – stovfl Jul 14 '20 at 20:13

3 Answers3

2

Not the answer in python but in bash.

sed -n '/2020-01-31T20:12:38.1234Z/,/2020-01-31T20:12:41.1234Z/p' file.log

Output:

2020-01-31T20:12:38.1234Z, asdasdasdasdasdasd,...\n
2020-01-31T20:12:39.1234Z, abcdef,...\n
2020-01-31T20:12:40.1234Z, ghikjl,...\n
2020-01-31T20:12:41.1234Z, mnopqrstuv,...\n
bigbounty
  • 16,526
  • 5
  • 37
  • 65
0

if you want in python,

import time  
from datetime import datetime as dt  

def to_timestamp(date,forma='%Y-%m-%dT%H:%M:%S'):  
    return time.mktime(dt.strptime(date,forma).timetuple()) 

start=to_timestamp(startdate)
end=to_timestamp(enddate)
logs={}
with open("logfile.log") as f:
    for line in f:
        date=line.split(', ')[0].split('.')[0]
        logline=line.split(', ')[1].strip('\n')
        if to_timestamp(date)>=start and to_timestamp(end) <= end:
            logs[date]=logline

abdo
  • 25
  • 5
  • The above code gives me an error TypeError: strptime() argument 1 must be str, not datetime.datetime. Also im trying to print the log lines while reading them at the same time – NiK K Jul 13 '20 at 09:04
  • just past date as string like in the file, like this "2020-01-31T20:12:38", just do `print(f'{date}\t{logline}')` – abdo Jul 13 '20 at 16:50
0

Since the time string is already structured nicely in your file, you can just do a simple string comparison between the times you're interested in without converting the string to a datetime object.

Use the csv module to read in the file, using the default comma delimiter, and then the filter() function to filter between two dates.

import csv

reader = csv.reader(open("logfile.log"))
filtered = filter(lambda p: p[0].split('.')[0] >= '2020-01-31T20:12:39' and p[0].split('.')[0] <= '2020-01-31T20:12:41', reader)
for l in filtered:
    print(','.join(l))

Edit: I used split() to remove the fractional part of the time string in the string comparison since you're interested in times to the nearest minute accuracy, e.g. 2020-01-31T20:12:39.

jignatius
  • 6,304
  • 2
  • 15
  • 30
  • is there a way to format the output like a normal line removing the square braces – NiK K Jul 13 '20 at 10:27
  • @NiKK Yes. By doing: `print(','.join(l))`. Updated. – jignatius Jul 13 '20 at 10:34
  • By using rpartition there is not output. However if i use only p[0].partition then i can get the correct output. The main issue is that the last line is not printed. The resultant list only prints until the second last line. The line with the log date of '2020-01-31T20:12:41' is not printed – NiK K Jul 14 '20 at 18:52
  • @NiKK I've changed `rpartition` to `split`. That should get the date string up to the **first** dot. That should work on the sample data you've given in your question. I don't know if your real data is different? Make sure you use >= and <=. – jignatius Jul 14 '20 at 19:44