I am developing a string filter for huge process log file in distributed system.
These log files are >1GB and contains millions of lines.These logs contains special type of message blocks which are starting from "SMsg{" and end from "}". My program is reading the whole file line by line and put the line numbers which the line contains "SMsg{" to an list.Here is my python method to do that.
def FindNMsgStart(self,logfile):
self.logfile = logfile
lf = LogFilter()
infile = lf.OpenFile(logfile, 'Input')
NMsgBlockStart = list()
for num, line in enumerate(infile.readlines()):
if re.search('SMsg{', line):
NMsgBlockStart.append(num)
return NMsgBlockStart
This is my lookup function to search any kind of word in the text file.
def Lookup(self,infile,regex,start,end):
self.infile = infile
self.regex = regex
self.start = start
self.end = end
result = 0
for num, line in enumerate(itertools.islice(infile,start,end)):
if re.search(regex, line):
result = num + start
break
return result
Then I will get that list and find the end for each starting block through the whole file. Following is my code for find the end.
def FindNmlMsgEnd(self,logfile,NMsgBlockStart):
self.logfile = logfile
self.NMsgBlockStart = NMsgBlockStart
NMsgBlockEnd = list()
lf = LogFilter()
length = len(NMsgBlockStart)
if length > 0:
for i in range (0,length):
start=NMsgBlockStart[i]
infile = lf.OpenFile(logfile, 'Input')
lines = lf.LineCount(logfile, 'Input')
end = lf.Lookup(infile, '}', start, lines+1)
NMsgBlockEnd.append(end)
return NMsgBlockEnd
else:
print("There is no Normal Message blocks.")
But those method are never efficient enough to handle huge files. The program is running long time without a result.
- Is there efficient way to do this?
- If yes, How could I do this?
I am doing another filters too , But first I need to find a solution for this basic problem.I am really new to python. Please help me.