I have a python script that cache some information to a file. The file will be reused if exists. Otherwise the script will call some other functions, which takes a long time, to generate such file. The name of file has certain patterns, and they are all stored in a $WORKING_DIRECTORY
def dummy(param):
fname = "file"+params
if fname exists in $WORKING_DIR:
reuse file
else:
long_time_process(param)
create file in $WORKING_DIR
Since this dummy functions will be called multiple times with different params, a lot of files will be generated. I want to keep the size of directory moderate and the information in the files relatively up-to-date. So I want to set a THRESHOLD for the size of directory. Once the limit is reached, I will remove the oldest files till the size of directory is reduced to half of of THRESHOLD.
My current solution is:
def dummy(param):
purge($WORKING_DIR)
...#Rest of dummy logic
def purge(dir):
if get_size(dir) > THRESHOLD:
while get_size(dir) > THRESHOLD/2:
remove oldest file
def get_size(dir):
size = 0
for file in dir:
size = size + file.size()
return size
This surely does the work, but the call to purge is unnecessary most of times, since it will only reach the THRESHOLD once after a thousand calls. On top that, get_size of directory could also be time consuming if number of files is huge.
So the question is how do I optimize the get_size and integrate the purge logic with my current dummy function? Is there a good pythonic way to do it? Or a pattern I can use? Thanks