2

The program is a standard flask program, and it does some cleanup as part of the initialization. In the cleanup() procedure, using os.remove("abc.txt"), I noticed that the file is removed, but not reclaimed by the OS. I use both "python website.py" and "gunicorn website:app" to run the application and both have the same problem, in Linux environment. In MacOS, I can't reproduce it.

After the file is os.remove , it is no longer listed in "ls" command, but when I run

lsof | grep deleted

I can still see this file being listed as deleted but opened by the python application.

Because this file is already "os.remove", it is not listed in ls command, and du will not calculate this file. But if this file is big enough, df command will show the space of this file is still being occupied, not being reclaimed. Because this file is still "being open by the flask application", as the lsof program claimed.

As soon as I stop the flask application from running, the lsof will not have this file, and the space is reclaimed.

Usually when the file is too small, or when the application stops or restarts frequently, you won't notice the space is being occupied. But this is not very reasonable for keeping the space. I would expect the website running for years.

When searching internet for "open but deleted files", most suggestions are "find the application and kill it". Is there a way to keep the flask application running without restarting it? My application doesn't actually "open" this file, but simply os.remove it.

Suggestion on how to delete file and re-claim the space immediately?

davidism
  • 121,510
  • 29
  • 395
  • 339
Ben L
  • 171
  • 1
  • 9
  • Do you have referenced the "abc.txt" file somewhere in the program? Maybe also using open() without with-statement or closing the file? The best would be if you can provide somehow a minimal example. (and I guess it has nothing to do with flask - please consider taking python as a tag - the audience would also be much bigger, then) – colidyre Jul 15 '21 at 23:12
  • Thanks for your tagging advice. The "abc.txt" is not being used anywhere, the initialization procedure is to delete files in the specific folder that are 15 days old, basically: ` filetime = os.path.getmtime(filepath) if now - filetime > ExpiryDays * 24 * 60 * 60: # older than ExpiryDays os.remove(filepath) ` – Ben L Jul 15 '21 at 23:42

2 Answers2

1

The Flask application either needs the large file to continue running, or does not release unneeded resources. If the app needs the large file, that's it. Otherwise, the app is buggy and in need to be corrected. In both cases, the "being open" status of the large file (that, at least on Linux, leads to the file still being present in the mass memory system) cannot be controlled by your script.

gboffi
  • 22,939
  • 8
  • 54
  • 85
  • Thanks. After some more digging and trial&error, it is confirmed my program is more complicated than a simple flask app, and it is the "complicate" part that has bug hidden. I created a simple flask app and confirmed that it doesn't have the same problem. – Ben L Jul 16 '21 at 13:41
1

The os.remove() only delegates the removal of the file to the operating system. If the file is still somewhere referenced in your code, lsof will show the file, of course. Without providing code, it is hard to tell where the unwanted behavior comes from. But at least I can give you some insights about the referencing behavior.

Here is a small script that should only show you that a file could be still considered as open if it is referenced.

import os
import psutil

PATH = "abc.txt"


def write_file(filepath):
    """Simulating existing file with correctly closing it at the end"""
    with open(filepath, "x") as file:
        file.write("Hello, world!")


def remove_file(filepath):
    """Let the operating system handle the file removement"""
    os.remove(filepath)


def lsof():
    """Simulating lsof command (requires e.g. `pip install psutil`)"""
    p = psutil.Process()
    open_files = p.open_files()
    if open_files:
        return "\n".join(os.path.basename(p.path) for p in p.open_files())
    else:
        return "No open files found."


if __name__ == "__main__":
    print("\n----- EXAMPLE 1 -----\n")

    write_file(PATH)
    print(lsof())
    remove_file(PATH)
    print(lsof())

    print("\n----- EXAMPLE 2 -----\n")

    write_file(PATH)
    file = open(PATH)  # referenced!
    print(lsof())
    remove_file(PATH)
    print(lsof())

And the output of example 2 shows you, that after the file was referenced, it is also available to lsof command:

----- EXAMPLE 1 -----

No open files found.
No open files found.

----- EXAMPLE 2 -----

abc.txt
No open files found.

Both examples show you also that there is no open file descriptor any more after removing the file.

You can maybe try to debug your code e.g. with psutil.Process.open_files() similar to my example to find out where a mismatch of the expectation exists that a specific file should be closed.

colidyre
  • 4,170
  • 12
  • 37
  • 53