I have a large file that can have strings like file_+0.txt, file_[]1.txt, file_~8.txt
etc.
I want to find the missing files_*.txt
until a certain number.
For example if I give the below file and a number 5, it should tell that the missing ones are 1 and 4
asdffile_[0.txtsadfe
asqwffile_~2.txtsafwe
awedffile_[]2.txtsdfwe
qwefile_*0.txtsade
zsffile_+3.txtsadwe
I wrote a Python script to which I can give the file path and a number and it will give me all file names that are missing until that number.
My program works for small files. But when I give a large file (12MB) that can have file numbers until 10000, it just hangs.
Here is my current Python code
#! /usr/bin/env/python
import mmap
import re
def main():
filePath = input("Enter file path: ")
endFileNum = input("Enter end file number: ")
print(filePath)
print(endFileNum)
filesMissing = []
filesPresent = []
f = open(filePath, 'rb', 0)
s = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
for x in range(int(endFileNum)):
myRegex = r'(.*)file(.*)' + re.escape(str(x)) + r'\.txt'
myRegex = bytes(myRegex, 'utf-8')
if re.search(myRegex, s):
filesPresent.append(x)
else:
filesMissing.append(x)
#print(filesPresent)
print(filesMissing)
if __name__ == "__main__":
main()
Output hangs when I give a 12MB file which can have files from 0 to 9999
$python findFileNumbers.py
Enter file path: abc.log
Enter end file number: 10000
Output for a small file (same as the above example)
$python findFileNumbers.py
Enter file path: sample.log
Enter end file number: 5
[0, 2, 3]
[1, 4]
- How can I make this work for big files?
- Is there a better way I can get these results instead of a Python script?
Thanks in advance!