-1

To explain in detail, I have a text file in which I am logging some data from a varying number of process instances(i.e there could be between 4 to 16 process instances generating the logs).
All the instances write into one file in the following format:

2018-09-07 11:34:47,251 - AppLog - INFO - 
    ******************************************
    Log Report - Consume Cycle jhTyjs-ConsumeCycle
    ******************************************
    Uptime: 144708.62724542618s
    Jobs Run: 16866
    Jobs Current: 1
    Q Avg Read Time: 0
    Q Msgs Read: 0
    Worker Load: ['1.00', '1.00', '1.00']
    ******************************************

2018-09-07 11:37:47,439 - AppLog - INFO - 
    ******************************************
    Log Report - Consume Cycle aftTys-ConsumeCycle
    ******************************************
    Uptime: 144888.81490063667s
    Jobs Run: 16866
    Jobs Current: 1
    Q Avg Read Time: 0
    Q Msgs Read: 0
    Worker Load: ['1.00', '1.00', '1.00']
    ******************************************

  This is an error line which could be generated by any of the instances and can be anything, <br> like qfuigeececevwovw or wefebew efeofweffhw v wihv or any python \n exception or aiosfgd ceqic eceewfi 

2018-09-07 11:40:47,615 - AppLog - INFO - 
    ******************************************
    Log Report - Consume Cycle hdyGid-ConsumeCycle
    ******************************************
    Uptime: 145068.99103808403s
    Jobs Run: 16866
    Jobs Current: 1
    Q Avg Read Time: 0
    Q Msgs Read: 0
    Worker Load: ['1.00', '1.00', '1.00']
    ******************************************

(In Log Report - Consume Cycle [placeholder]-ConsumeCycle of every log, the [placeholder] is random)
So, my file consists of a large number of logs in the above format, one after another. Every instance generates the log in every 3 minutes. (i.e all the instances generate exactly one log in 3 minutes)
If there is an error from any of the instances, they log that as well in the same file. So the data structure is not at all consistent.

Now, I have to get the last logged data i.e last 3 minutes from all of the instances and perform some tasks on them.
Is there any way to get the last 3 minutes data written into the log file (be it errors or perfect logs in the above format)?

[EDIT] Added an error line in between the logs

Amit Yadav
  • 4,422
  • 5
  • 34
  • 79
  • So, technically, you want to get the last log in the file for each of the instances (assuming that what follows `Consume Cycle` is the unique instance ID)? – zwer Sep 19 '18 at 11:37
  • 1
    You are correct. The last 3 minutes data will give me the last logs from all of the instances. There could be a case when only one proper log is generated and all the other instances have generated some random python error. – Amit Yadav Sep 19 '18 at 11:44
  • When you say the last three minutes, do you want a list of the entries and each entry is the multi-line text? – Hogstrom Sep 19 '18 at 11:48
  • Is the log file expected to be huge (i.e. can you load it into the working memory whole without significant performance hits)? – zwer Sep 19 '18 at 11:48
  • 1
    @Hogstrom yes, that is exactly what I want – Amit Yadav Sep 19 '18 at 11:49
  • 1
    @zwer not really. Log file will be deleted after an interval of time and a new one is created. So the file size is limited and can be loaded into the working memory. – Amit Yadav Sep 19 '18 at 11:51

2 Answers2

2

You can do a split at

******************************************\n\n

with

record_list = file_Obj.read().split("******************************************\n\n")

This will give you each independent record in a list. You might need to escape the backslash. You can take the last element of a list by slicing it.

print(record_list[-1])
Kalpit
  • 891
  • 1
  • 8
  • 24
  • 1
    But how will that decide if I have only the last logs from all the instances? Splitting will give me all of the logs, and I have a requirement of only the last logs(that can be from 4 to 16) from the process instances. – Amit Yadav Sep 19 '18 at 11:46
  • 1
    @AmitYadav: Just parse the time and discard all the records which are too old. – Eric Duminil Sep 19 '18 at 12:48
2

Since you said that the file doesn't get too large to process you don't need anything fancy to sift through it (i.e. buffered read from behind) - you can just iterate over the whole file, collect individual log entries and discard the ones that occurred more than 3 minutes ago.

This is especially easy given that your entries clearly differ from one another by the date-time in the beginning and your log date format is in a ISO-8601 format so you don't even need to parse the date - you can use straight lexicographic comparison.

So, one way to do it would be:

import datetime

# if your datetime is in UTC use datetime.datetime.utcnow() instead
threshold = datetime.datetime.now() - datetime.timedelta(minutes=3)  # 3m ago
# turn it into a ISO-8601 string
threshold_cmp = threshold.strftime("%Y-%m-%d %H:%M:%S")  # we'll ignore the milliseconds

entries = []
with open("path/to/your.log") as f:  # open your log for reading
    current_date = ""
    current_entry = ""
    for line in f:  # iterate over it line-by-line
        if line[0].isdigit():  # beginning of a (new) log entry
            # store the previous entry if newer than 3 minutes
            if current_date >= threshold_cmp:  # store the previous entry if newer than 3m
                entries.append(current_entry)
            current_date = line[:19]  # store the date of this (new) entry
            current_entry = ""  # (re)initialize the entry
        current_entry += line  # add the current line to the cached entry
    if current_entry and current_date >= threshold_cmp:  # store the leftovers, if any
        entries.append(current_entry)

# now the list 'entries' contains individual entries that occurred in the past 3 minutes
print("".join(entries))  # print them out, or do whatever you want with them

You can make this even easier by discriminating on a placeholder, but you've said that it's a random one so you have to rely on the datetime.

zwer
  • 24,943
  • 3
  • 48
  • 66
  • Great answer. But as I have mentioned there could be error lines in the log file too, from any of the instances. I have even edited my question and added a random error line. – Amit Yadav Sep 19 '18 at 12:33
  • @AmitYadav - And you identify the error line(s) (which you, presumably, don't want to collect) by being outside of the log entry (i.e. the 10 lines following the date)? – zwer Sep 19 '18 at 12:57