2

I am trying to format lsof output in a more parsable way.

Background: since not all processes with open handles have thread IDs, the number of fields separated by whitespaces (blanks AFAIS) is not necessarily determined.

As output fields, I need the PID, UID/username and path (if it is a file - I am grepping for the path since +D is quite slow).

As field separator I switched from NL to NUL (and replacing null by "|" for readability)

So I tried

> /usr/sbin/lsof -F pnuf0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
 ftxt|n/usr/bin/cvmfs2|
 fmem|n/usr/lib64/libcvmfs_fuse.so.2.3.5|

which produces only the file descriptor and name (not in the given order?) but not the PID or UID?

As side note, the PID and UID fields are apparently already 'empty' when selecting them individually

> /usr/sbin/lsof -F u0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
> /usr/sbin/lsof -F p0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
> /usr/sbin/lsof -F n0 | sed 's/\x0/|/g' | grep "cvmfs" | tail -n 2
  n/usr/bin/cvmfs2|
  n/usr/lib64/libcvmfs_fuse.so.2.3.5|

What would be the correct way to parse lsof's output as "PD,NAME,UID,FILEDESC" ?

THX
  • 553
  • 2
  • 8
  • 18

3 Answers3

4

Since I never found a good answer to this on the web, I spent many hours working on this problem. I hope I can spare someone this pain. lsof by itself will print out horizontal output with missing values making it impossible to parse properly

To format lsof you need to use the command:

lsof -F pcuftDsin

adding the -F will print results out vertically, let me explain each part.

  • lsof: gets a list of all open files by process
  • -F: formats the output vertical instead of horizontal
  • p: will prefix the PID or (Process ID) column
  • c: will prefix the COMMAND or (Process Name) column
  • u: will prefix the User column that the process is running under
  • f: will prefix the File Descriptor column
  • t: will prefix the type column
  • D: will prefix the Device column
  • s: will prefix the SizeOff column
  • i: will prefix the Node column
  • n: will prefix the Name or (File Path)

output:

p3026
ccom.apple.appkit.xpc.openAndSavePanelService
u501
fcwd
tDIR
D0x1000004
s704
i2
n/
ftxt
tREG
D0x1000004
s94592
i1152921500312434319
n/System/Library/Frameworks/AppKit.framework/Versions/C/XPCServices/com.apple.appkit.xpc.openAndSavePanelService.xpc/Contents/MacOS/com.apple.appkit.xpc.openAndSavePanelService
ftxt
tREG
D0x1000004
s27876
i45156619
n/Library/Preferences/Logging/.plist-cache.usI0gbvW
ftxt
tREG
D0x1000004
s28515184
i1152921500312399135
n/usr/share/icu/icudt64l.dat
ftxt
tREG
D0x1000004
s239648
i31225967
n/private/var/db/timezone/tz/2019c.1.0/icutz/icutz44l.dat
ftxt
tREG
D0x1000004
s3695464
i1152921500312406201
n/System/Library/CoreServices/SystemAppearance.bundle/Contents/Resources/SystemAppearance.car
ftxt
tREG
D0x1000004
s136100
i38828241
n/System/Library/Caches/com.apple.IntlDataCache.le.kbdx

As you can see, each line is prefixed with the proper letter assigned above. Another important thing to note is that "Process ID", "Process Name" and User will only be printed one time per set of open files, for the database storage, I needed these fields for each line that was printed. I was performing a java project, so the code I used to parse it was as shown below:

    public static void main(String[] args) {

        String command = "lsof -F pcuftDsin";
        String captureBody = "";
        Process proc = null;
        try {
            proc = Runtime.getRuntime().exec(command);
        } catch (IOException e) {
            e.printStackTrace();
        }

        BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream()));
        String line = "";

        String ProcessID = "";
        String ProcessName = "";
        String User = "";
        String FD = "null";
        String Type = "null";
        String Device = "null";
        String SizeOff = "null";
        String Node = "null";
        String File = "null";

        while(true) {
            try {
                line = reader.readLine();
                if (line == null) {
                    break;
                } else {
                    if (line.startsWith("p")) {
                        ProcessID = line;
                    }  else if (line.startsWith("c")) {
                        ProcessName = line;
                    } else if (line.startsWith("u")) {
                        User = line;
                    } else if (line.startsWith("f")) {
                        FD = line;
                    } else if (line.startsWith("t")) {
                        Type = line;
                    } else if (line.startsWith("D")) {
                        Device = line;
                    } else if (line.startsWith("s")) {
                        SizeOff = line;
                    } else if (line.startsWith("i")) {
                        Node = line;
                    } else if (line.startsWith("n")){
                        File = line;

                        System.out.println(ProcessID  + "," + ProcessName + "," + User + "," + FD + "," + Type  + "," + Device  + "," + SizeOff  + "," + Node  + "," + File);

                        FD = "null";
                        Type = "null";
                        Device = "null";
                        SizeOff = "null";
                        Node = "null";
                        File = "null";
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        try {
            proc.waitFor();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

output

p94484,ccom.apple.CoreSimulator.CoreSim,u501,ftxt,tREG,D0x1000004,s239648,i31225967,n/private/var/db/timezone/tz/2019c.1.0/icutz/icutz44l.dat

Because I was storing the output, I needed the empty fields to show something, I used null, you can use anything as default text, or even just use an empty string for the missing fields, not all fields will be populated. If anyone has any suggestions on how I could improve the code performance I am all ears.

Alexar
  • 1,858
  • 5
  • 24
  • 34
0

Looking for the same thing found that even if I specify -F 0 it splits the results over several lines which makes lsof almost unusable with -F option:

# lsof -F pnuf0 /tmp/aaa | tr '\0' '|'
p19677|u1000|
f4|n/tmp/aaa|

Damn. I've ended using find or simply grepping stat -c"%u %N" /proc/[0-9]/fd/

tharrrk
  • 361
  • 1
  • 3
  • 3
0

I worked it out this way:

lsof |awk ' { if ( NF == 12) { x=$10; y=$4 } else if ( NF == 11 && $11 != "(deleted)" ) { x=$10; y=$4 } else { x=$9; y=$3}; print $2,y, x }'

If there is a TID and the file is deleted, then the number of fields will be 12. If there is no TID and the file is deleted, then the number of fields will be 11. Lastly, if there is no TID and the file is not deleted, there will be 10 fields.

Mihai Chelaru
  • 7,614
  • 14
  • 45
  • 51
Berni
  • 1