-1

I need to calculate the size of deleted file size per user from lsof command in bash. But there are few rows which has third column as blank causing issues to sum up values.
For example, in the attached image I need to show the total size by each user type for deleted files, now because of few blank cells in third column the column count is not coming as right and hence the resulting values are not correct too. I tried using few options to replace blank cells with some dummy text but that is not working well, so need suggestion to solve this problem and also any command to show the resulting size in human readable format ?

I tried to add the output by user type with following command

lsof|grep -i deleted| awk '{a[$5] +=$8} END{for (i in a) print i, a[$i]}'

Above command did not give the right results, so I tried below command to replace blank cells with a dummy text

lsof|grep -i deleted| awk '!$3{$3="NA"}{a[$5] +=$8} END{for (i in a) print i, a[$i]}'

That did not work, so I tried using if condition

lsof|grep -i deleted| awk '{if($3 =="") $3="NA"; a[$5] +=$8} END{for (i in a) print i, a[$i]}'

enter image description here

Learner
  • 1,544
  • 8
  • 29
  • 55
  • assuming you're interested in the first column (user) and last column (file size?), consider using `awk` to process just the first and last fields (ie, `$1` and `$(NF)`) thus ignoring everything else in between – markp-fuso Mar 12 '20 at 17:26
  • `awk` and `bash` are two different languages, with two different interpreters. There's little reason to tie them together in a question -- awk will behave the same way with any non-bash shell (or no shell at all), so one should be able to ask just an awk question or just a bash question. – Charles Duffy Mar 12 '20 at 17:32
  • @markp: I'm interested in 5th and second-last column. I can use reference of last column to use second-last column but to use 5th column, there are some cases where intermediate cells are null in from last too so that is again the same issue – Learner Mar 12 '20 at 17:33
  • 2
    That said, more importantly -- give us a copy/paste of output, not a screenshot of output. If we know what your version of `lsof` emits, we can test it with our own `awk` script, and thus make sure that our answers are correct before posting them. Needing to retype off a screenshot is both extra work and error-prone (loses the distinction between tabs and spaces, among other things). – Charles Duffy Mar 12 '20 at 17:33
  • 2
    Similarly, an ideal [mre] includes a copy/paste of both *correct output* and *actual output*, so we can compare the two rather than needing to infer off question text. – Charles Duffy Mar 12 '20 at 17:34
  • @CharlesDuffy: Thanks Charles, I have edited the tags and question subject. But the intention is to get any alternative suggestion too to sum up the values. Unfortunately, I could not copy paste text format lsof output due to some restriction by organization . So I took the screen shot – Learner Mar 12 '20 at 17:36
  • *arg* ... forgot to scroll my window to the right ... so missed the other columns; for 'next to last' column in `awk` you can use `$(NF-1)`; if the 2x columns you want are always the same offset from the end, you could use this same concept, eg, for the 4th from the end use `$(NF-3)` – markp-fuso Mar 12 '20 at 17:41
  • @markp: Yeah, I have used $(NF-1) for next to last but could not find out the values for 5th column by referring the last column too, because few intermediate cells are blank from the other side too – Learner Mar 12 '20 at 17:43
  • so your 'user (type)' is 'REG' ?? (5th column) – markp-fuso Mar 12 '20 at 17:46
  • @markp: There are many other type of users too , that is just a sample from 60k such lines – Learner Mar 12 '20 at 17:49
  • 1
    The better option here (rather than monkeying around in `awk`) is to check out `lsof` manpage. Specifically concentrating on the `-F` flag and reading the `OUTPUT FOR OTHER PROGRAMS` section of the manpage. The developers of `lsof` built the application to make its output easily consumable downstream. – JNevill Mar 12 '20 at 17:58
  • 1
    the image shows 5 lines .. 4 are 'deleted' which, according to your sample code, you're going to ignore; so that leaves us with just the 1 line of interest ... but is this a line with/without a 3rd column (that you've mentioned)? we could eliminate a lot of this back-n-forth in the comments if you'd follow that link Charles provided ... provide data that you want to ignore, provide data that you want to process, provide data with and without the 3rd (and other?) columns, provide data with a couple different user types, and last but not least, provide the desired output for said input data set – markp-fuso Mar 12 '20 at 17:59

1 Answers1

0

Assuming you are interested in file owner/size/name, here is a python script (test.py) which can get them :

import re
import sys

user_last_column = 0
for line in sys.stdin:
    if  user_last_column:
        if  re.search(r'\(deleted\)$', line):
            print("%s %s %s" % (re.sub(r'.* ', '', line[:user_last_column]), 
                                re.sub(r'.* ', '', line[:size_last_column]),
                                line[name_first_column:][:-11]))
    else: # Process first row which is header
        user_last_column = line.find('USER') + 4
        size_last_column = line.find('SIZE/OFF') + 8
        name_first_column = line.find('NAME')

Call it with :

lsof | python test.py | sort -u # [sort -u] to remove duplicates 
or 
python test.py < /tmp/sample

Explanation The main thing is to find positions (in characters) of the three pieces of info.

Philippe
  • 20,025
  • 2
  • 23
  • 32