33

Is there a way to track all file I/O for a given process? All I really need is the locations of files being read from/written to from a given process (and ideally if it was a read or write operation although that's not as important).

I can run the process and track it rather than needing to attach to an existing process which I would assume is significantly simpler. Is there any kind of wrapper utility I can run a process though that will monitor file access?

Akira Yamamoto
  • 4,685
  • 4
  • 42
  • 43
Tom B
  • 2,735
  • 2
  • 24
  • 30
  • [Continuously monitor files opened/accessed by a process](https://superuser.com/q/348738/241386) – phuclv May 22 '17 at 03:37

4 Answers4

55

lsof:

Try doing this as a starter :

lsof -p <PID>

this command will list all currently open files, fd, sockets for the process with the passed process ID.

For your special needs, see what I can offer as a solution to monitor a php script :

php foo.php & _pid=$!
lsof -r1 -p $_pid
kill %1 # if you want to kill php script

strace:

I recommend the use of strace. Unlike lsof, it stays running for as long as the process is running. It will print out which syscalls are being called when they are called. -e trace=file filters only for syscalls that access the filesystem:

sudo strace -f -t -e trace=file php foo.php

or for an already running process :

sudo strace -f -t -e trace=file -p <PID>
Flimm
  • 136,138
  • 45
  • 251
  • 267
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • Thanks that's a good starting point! It works for processes already running at the moment it's run. I'm trying to do this for a PHP script for its entire execution, tracking the files from the start of the process until it exists. Looking at the help, There's a -r repeat option but this seems to periodically scan the files that are open by the process rather than have been opened. Essentially I want to do this: lsof -p $$ && exec php foo.php This doesn't seem to list files that are opened by foo.php – Tom B Dec 11 '14 at 17:11
  • thanks, that's certainly providing more relevant information and showing all the php extensions being loaded, the script contains unfortunately, file.txt is not listed in the output. I can verify the file is being opened by amending the script to print the contents of file.txt but I still don't see file.txt in the output of lsof. – Tom B Dec 11 '14 at 17:26
  • 1
    To properly trace an AppImage, I needed to run `strace` as root but the command using my own user. This got the job done: `sudo strace -fte trace=%file -u $(id -un) ` – raphinesse Aug 02 '20 at 10:15
  • Mixing your two solutions together becomes perfect: `php foo.php & sudo strace -f -t -e trace=file -p $!` especially for short running tasks. – Unknown May 01 '23 at 12:29
6

Besides strace there is another option which does not substantially slow down the monitored process. Using the Liunx kernel's fanotify (not to be confused with the more popular inotify) it is possible to monitor whole mount-points for IO-activity. With unshared mountnamespaces the mounts of a given process can be isolated fromt the rest of the system (a key technology behind docker).

An implementation of this concept can be found in shournal, which I am the author of.

Example on the shell:

$ shournal -e sh -c 'cat foo > bar'
$ shournal --query --history 1
...
  1 written file(s):
     /home/user/bar
  1 read file(s):
     /home/user/foo 
spawn
  • 192
  • 3
  • 9
  • External links are always highly appreciated as sources, but imagine this one was to become invalid - your solution would be unsalvageable for future SO users. Please consider posting code here and explaining your solution so we all can learn. – harmonica141 Jul 22 '19 at 11:21
  • @harmonica141: That's always the problem: what to write and what to omit... A complete, minimal example would be not much shorter than the example at the bottom at http://man7.org/linux/man-pages/man7/fanotify.7.html . In fact, it could be almost the same with a leading `unshare( CLONE_NEWNS);`. Do you think it would be helpful to include the full source here? – spawn Jul 24 '19 at 10:32
2

strace is an amazing tool but its output is a bit verbose.
If you want you can use a tool I've written which processes strace output and provide a CSV report of all files accessed (TCP sockets too) with the following data:
1. Filename
2. Read/Written bytes
3. Number of read/write operations
4. Number of time the file was opened

It can be run on new processes or processes already running (using /proc/fd data).
I found it useful for debugging scenarios and performance analysis.
You can find it here: iotrace

Example output:

Filename, Read bytes, Written bytes, Opened, Read op, Write op
/dev/pts/1,1,526512,0,1,8904
socket_127.0.0.1:47948->127.0.0.1:22,1781764,396,0,8905,11
myfile.txt,65,0,9,10,0
pipe:[3339],0,0,0,1,0

Afterward, you can process the CSV data in Excel or other tools for sorting or other analysis required.
The downside is you need to download & compile and it isn't always 100% accurate.

Avner Levy
  • 6,601
  • 9
  • 53
  • 92
0

Something like this may lessen the performance impact of the file activity monitoring.

$ watch -n 2.0 timeout 0.2 strace -p `pgrep myprogram` -fe trace=file

Where myprogram is the process name, 2.0 is the idle period between each monitoring period and 0.2 is the length of the monitoring period in seconds.

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82