-1

Could you recommend a good utility/BASH function that scans a directory and outputs it's size?

I need to find the size of the executables in the binary directories: /usr/bin and /bin in order to find their average and median.

I am not sure whether it's best to use du command or ls?

What's the easiest and efficient way to find the median and average of a directory in Linux?

PS: It should be recursive as there are several directories inside.

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
hps13
  • 1
  • Have a look at `du`. The command `man du` will give you more information. Also you should look at `awk` to parse the output of `du` and compute the medium and average. There are a plethora of examples here that do this computation. – kvantour May 03 '19 at 06:24
  • Well using Windows CMD.exe is unlikely to help. – Noodles May 03 '19 at 06:28

1 Answers1

1

This is a two step process. First find the disk usage of every file and then calculate the values.

For the first du is clearly my favorite.

find /usr/bin -type f -exec du '{}' '+'

This will search ever file (-type f) and will append ('+') its filename ('{}') to an invokation (-exec) of du.

The result will be a tab separated list of usage (in blocks IIRC) and filename.

Now comes the second part (here for the average). This list are we going to feed into awk and let it sum up and divide by the number of rows

{ sum = $1 } END { print "avg: " sum/NR }

The first block is going to be executed every line and will add the value of the first (tab separated) column to the variable sum. The other block is prefixed with END meaning that it will get executed when the stdin is EOF. NR is a special variable meaning the number of rows.

So the finished command looks like:

find /usr/bin -type f -exec du '{}' '+' | awk '{ sum += $1 } END { print "Avg: " sum/NR }'

Now go read about find, awk and shell pipelines. Those things will make your life considerably easier when you have to deal with linux shell stuff. Also basic knowledge about line buffering and standard IO streams is helpful.

Martin B.
  • 1,567
  • 14
  • 26
  • thank you for your detailed answer. how to check the median? average is easier as it's dividing size by number of files, but how would is it possible to find the median? it seems inefficient to search for the file in the middle and check it's length(if odd) or trying to find the 3 "middle" elements and calculate average of [middle-1] and [middle+1] – hps13 May 03 '19 at 08:26
  • Pipe the output of find into a file count the rows and then extract with head and tail only those lines that matter... It's a little more complicated than a simple pipe but will be possible in a couple of lines. – Martin B. May 03 '19 at 11:51