I am developing a script to read through logs on a log folder and check on each .txt log file:
1) if the log has the string 'FileData' and
2) if the log does not have the string 'Error in File Data'
If this condition is met, it needs to read the file and collect the content of line 2. After some research on the topic, I found a solution to the problem and the script below works. The issue is that reading through 3000 files takes ~20min and with the number of files expected to grow very fast, this solution is unfeasible.
import os
import mmap
Dict = {}
for log in sorted(os.listdir(log_folder)):
with open(os.path.join(log_folder, log), 'r') as f:
s = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
if s.find(b'FileData') != -1 and s.find(b'Error in FileData') == -1:
lines = [line for line in islice(f, 2)][:1]
content = lines[1]
Dict[log] = content
If I run this only with the first find ('FileData), it is very fast, but the moment I added the second find ('Error in FileData') time increased not linearly. Is there another way to do the same action but in a faster way? I tried re.findall() and readlines() but the result where too similar to this one.
Thanks!