-2

I am using lsof to detect which files are opened by which process for a given directory. Example:

% lsof +D /Users/jack/Downloads
Spotify 1431 jack   75r   DIR    1,6      128 37333 /Users/jack/Downloads/file1.png
Dock    1439 jack   13r   DIR    1,6      128 37333 /Users/jack/Downloads/foo.psd
zsh     6644 jack  cwd    DIR    1,6      128 37333 /Users/jack/Downloads/foo.bmp

The man page of lsof(8) states:

lsof may process this option slowly and require a large amount of dynamic memory to do it. This is because it must descend the entire directory tree, rooted at D, calling stat(2) for each file and directory, building a list of all the files it finds, and searching that list for a match with every open file. When directory D is large, these steps can take a long time, so use this option prudently.

Coincidentally I am already traversing the directory and call os.stat inside Python right before I call lsof, means stat(2) is technically called twice.

Which information does does the stat object hold that I could process myself to imitate the functionality of lsof? Any help is highly appreciated!

HelloWorld
  • 2,392
  • 3
  • 31
  • 68

1 Answers1

0

You can easily find open files by process using the /proc filesystem. From the manpage:

/proc/[pid]/fd/ This is a subdirectory containing one entry for each file which the process has open, named by its file descriptor, and which is a symbolic link to the actual file. Thus, 0 is standard input, 1 standard output, 2 standard error, and so on. For file descriptors for pipes and sockets, the entries will be symbolic links whose content is the file type with the inode. A readlink(2) call on this file returns a string in the format: type:[inode]

Here is a small example script that lists all open files per process.

import os

def find_open_files(pid, prefix="/"):
    fds = os.listdir(f"/proc/{pid}/fd")
    open_files = []
    for fd in fds:
        try:
            # read the symlink from the file descrptior files
            open_file = os.readlink(f"/proc/{pid}/fd/{fd}")
            # this is a simple filter to only show "real" files
            # as stated in the manpage open_file could be something like socket:[1234]
            if open_file.startswith(prefix):
                open_files.append(open_file)
        except FileNotFoundError:
            # These errors happen, if
            #  1. the process exits in the time between our listdir and the readlink
            #  2. the file is closed before we reach readlink
            pass
    return open_files


proc_files = os.listdir("/proc")
# find all numeric dirs as there are also other dirs in /proc present
pids = [p for p in proc_files if p.isnumeric()]
all_open_files = {}
for pid in pids:
    all_open_files[pid] = find_open_files(pid)

print(all_open_files)
Garuno
  • 1,935
  • 1
  • 9
  • 20