How to find "growing" files inside a linux system

Question

I have a server that is constantly losing disk space so I reckon there must be some logs that I'm not aware of.

What is a good way to locate files that are constantly increasing in size?

You can write a script or use "watch" command for studying sizes of files — , Nov 03 '11 at 07:49
Have you checked logs (/var/log) and /tmp? For logs you should use logrotate to control their age and size. What's your partition layout? Good layout helps to narrow possible places, where such files are located. — , Nov 03 '11 at 08:22

score 12 · Answer 1 · edited Nov 20 '11 at 19:21

12

There is an utility called gt5 that displays current directory sizes as well as the difference from the last time you've checked.

edited Nov 20 '11 at 19:21

Scott Pack

14,907
10
53
83

answered Nov 20 '11 at 17:17

Jure1873

3,702
1
22
28

Their homepage is here http://gt5.sourceforge.net/ – Roland Pihlakas Jul 30 '21 at 23:42

Farhan · Answer 2 · 2012-11-23T13:02:20.827

8

you can use this command:

find / -size +100000k

which will return all files having space more than 100 Mega Bytes. you can decrease or increase the value of size depending upon your need.

Or

You can use a utility called "ncdu" , which automatically creates a MAP of file/folder sizes.

edited Nov 23 '12 at 13:02

answered Nov 20 '11 at 16:28

Farhan

4,269
11
49
80

This will only find the file if it's still associated with a directory entry - it won't find it if the file is deleted, but still open. – Alnitak Nov 20 '11 at 21:53
@Alnitak: i cannot understand what do you mean to say. please clarify. – Farhan Nov 21 '11 at 06:21
I mean that if a file is opened, but then deleted whilst still open, it'll continue to consume space on disk, but will be invisible to `find`. The space will only be released when the file is closed. – Alnitak Nov 21 '11 at 07:29
tracking such files is something very different from question asked. it is possible to track those files as well, but with AuditD deamon. – Farhan Nov 21 '11 at 09:31
2

This will also only display *large files*. It does not help in finding directories which keep growing by accumulating lots of small files over time for whatever reason. – deceze Nov 23 '12 at 11:18
@deceze: updated the answer with some additions – Farhan Nov 23 '12 at 13:03

ewwhite · Answer 3 · 2011-11-20T16:30:32.940

Look at using the ncdu command (available here) to give a nice summary view of directory size throughout the system. There are only a few common locations to check on a standard system for log files, so this should be easy to monitor. This is a good first step for discovery.

Long term, you should do one of the following...

Write a script to search for files larger than a specific size.

The best approach, however, is probably log maintenance and rotation.

score 2 · Answer 4 · answered Feb 14 '15 at 09:24

There's a simple shell-script, it uses sqlite to store the data, so you can generate varios reports with it. Just add it to your crontab: /root/bin/diskhogs.sh /directory/to/monitor

The script itself is there:

#!/bin/sh

# Checking the spool directory
SPOOL="/var/spool/diskhogs"
if [ ! -e "${SPOOL}" ]; then
        mkdir -p "${SPOOL}"
fi
if [ ! -d "${SPOOL}" ]; then
        echo "There are no ${SPOOL} directory" >&2
        exit 1
fi

if [ -z "${1}" ]; then
        DIR=.
else
        DIR="${1}"
fi

FILES=$(find "${DIR}" -type f)

TIME=$(date +%s)
if [ -z "${TIME}" ]; then
        echo "Can't determine current time" >&2
        exit 1
fi

for FILE in ${FILES}; do

        SIZE=$(ls -nl ${FILE} | awk '{ print $5 }')
        if [ -z "${SIZE}" ]; then
                echo "Can't determine size of the ${FILE} file" >&2
                continue
        fi

        sqlite3 "${SPOOL}/db" "INSERT INTO sizes VALUES ('${FILE}', '${TIME}', '${SIZE}');"
        if [ ${?} -ne 0 ]; then
                continue
        fi

done

for PERIOD in 60 300 600 1800 3600 86400; do

        TIME_WAS=$((${TIME} - ${PERIOD}))

        (
                echo "*** Since $(date --date="@${TIME_WAS}") (${PERIOD} seconds ago) ***"
                sqlite3 \
                        "${SPOOL}/db" \
                        "SELECT MAX(size) - MIN(size) AS mm, name
                                FROM sizes
                                WHERE time >= '${TIME_WAS}'
                                GROUP BY name
                                ORDER BY mm
                        ;"
        ) > "${SPOOL}/report_${PERIOD}"

done

If you need to generate more custom report, you can use sqlite:

sqlite3 /var/spool/diskhogs/db "
    SELECT MAX(size) - MIN(size) as mm, name
        FROM sizes
        WHERE
            time >= '$(date --date='10 days ago' +%s)' AND
            name like '/var/lib/libvirt/images/%'
        GROUP BY name
        ORDER BY mm DESC
    ;"

If I have some ideas on how to improve it, I'll update it on GitHub: https://gist.github.com/melnik13/7ad33c57aa33742b9854

score 0 · Answer 5 · answered Apr 13 '23 at 07:41

I wrote a little script watch-open-files, to watch open files that are changing and show their sizes growing. It could use some work, but it served its purpose of finding the growing files for me.

#!/bin/bash -eu
# watch-open-files: watch open files
lsof > open-files.txt
< open-files.txt txt2tsv > open-files.tsv
< open-files.tsv awk '$7 == "REG"' | kut 11  | uniqo |                                                                                       
grep -v -e '/proc' -e '/dev' -e '/usr/lib' -e '\.so\.' -e '\.so$' -e ' (deleted)' |                            
grep '^/' > interesting.txt
watch sh -c '< interesting.txt sortmtime 2>/dev/null | head -n 50 | kut 2 | xa ls -U -l'

At the moment, this script uses a bunch of other little tools:

watch-open-files: watch open files
txt2tsv: convert text to tab-separated values
kut: keep only the specified columns
uniqo: `uniqo' - like uniq, but it works on unsorted files, and preserves the order of lines
xa: xargs with a newline delimiter
kutc: cut columns from a file
guess_columns: guess the column positions of a file

I can possibly clean all this up into a single script if that would be better, and if someone else would like to use it.

score 0 · Answer 6 · answered Nov 20 '11 at 21:34

I found this handy perl script somewhere years ago and have used it ever since. Works great every time :-) The author(s) are listed at the top, I take no credit for this.

#!/usr/bin/perl
#
# dur - Disk|Directory Usage Reporter
#       Perl utility to check disk space utilisation
#
# The utility displays the disk usage:
#    - total number of files
#    - top big files
#    - extra info: aging files, directories
#
# USAGE: dur [-d] [-Tn] directory
#   eg, dur /usr           # top 5 big files for /usr
#       dur -T5 /opt       # top 5 big files for /opt
#       dur -T10 /         # top 10 big files for /
#       dur -d /opt        # directory usage for /opt
#
#
# NOTES:
# It is highly recommended to use standard File::Find Perl module
# when trying to process each file from a deep directory structure. 
# Some folks are writting their own rutine based on find(1) in Perl. 
# This sometimes will be slower than File::Find so make sure you 
# test this before you will run it in live production systems.
#
# There are a lot of talks over File::Find and its memory consumption and
# how can you minimize that. Basically it very much depends. I found that
# File::Find is much faster in Solaris 10 with a target directory of +1mil
# files than any custom perl script calling find(1M).
#
# You will see a memory usage increase but the script will be faster. The
# deeper the directory is the more memory will use.
#
#  Example:
#   You can easily check how dur works against a big deep directory,
#   over +1mil files:
#
#   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP      
# 19667 sparvu    228M  219M sleep   20    0   0:01:36 8.6% dur/1
#
#
# SEE ALSO:
#  http://www.perlmonks.org/?node_id=325146
#  
#
# COPYRIGHT: Copyright (c) 2007 Stefan Parvu
#
# 10-Dec-2006    Stefan Parvu    First Version, nawk to perl
# 02-May-2007       "      "     Added top variable for big files
# 13-May-2007       "      "     Added dir_usage subroutine
# 19-May-2007       "      "     Added comments, Perl Best Practices

use warnings;
use strict;
use File::Find;
use Getopt::Std;
use Time::HiRes qw(gettimeofday);


###########
# Variables
###########
my %files = ();
my %dirs = ();
my @sorted;
$|=1;
my $size = 0;
my $mtime = 0;
my $current_time = 0;

############################
#  Process command line args
############################
usage() if (($#ARGV+1)==0);
usage() if defined $ARGV[0] and $ARGV[0] eq "-h";
getopts('dT:s:') or usage();
my $topN  = defined $main::opt_T ? $main::opt_T : 5;
my $dirFlag = defined $main::opt_d ? $main::opt_d : 0;
my $secs = defined $main::opt_s ? $main::opt_s : 0;


#########################################
# Usage        : find(\&fileCount, @ARGV)
# Purpose      : counts the number, 
#              : of bytes of each file
# Returns      : A hash with all files
# Parameters   : 
# Comments     : Used from File::Find
# See Also     : n/a
#########################################
sub fileCount {
    if (-f $_) {
        if ($secs != 0) {
            $mtime = (stat($_))[9];
            #if ($mtime  $secs) {
                $files{$File::Find::name} = -s;
            }
        }
        else {
            $files{$File::Find::name} = -s;
        }
    }
    $mtime = 0;
}




#########################################
# Usage        : find(\&fileCount, @ARGV)
# Purpose      : counts the number,
#              : of bytes
# Returns      : scalar variable, with
#              : total number of bytes
# Parameters   :
# Comments     : Used from File::Find 
# See Also     : n/a
#########################################
sub dirCount {
    if (-f) {
        $size += -s;
    }
}

#########################################
# Usage        : dir_usage()
# Purpose      : reports the directory
#              : usage
# Returns      : n/a
# Parameters   : @ARGV
# Comments     : Calls File::Find
# See Also     : dirCount()
#########################################
sub dir_usage() {
    my $target = $ARGV[0];

    print "Processing directories...\n";

    opendir(D, $target) or 
    die("Couldn't open $target for reading: $!\n");

    chdir "$target";
    foreach (readdir D) {
        next if $_ =~ /^\.\.?$/;
        next if (! -d $_);
        find (\&dirCount, "$_");
        $dirs{$_} = $size;
        $size = 0;
    }

    closedir(D);

    @sorted = sort {$dirs{$b}  $dirs{$a}} keys %dirs;
    foreach (@sorted) {
        printf "%6d MB => %s\n",$dirs{$_}/1048576,$_;
    }
    print "Total directories processed: " . keys(%dirs) . "\n";
}

#########################################
# Usage        : top_files()
# Purpose      : print top N big files
# Returns      : n/a
# Parameters   : @ARGV
# Comments     : Calls File::Find,
#              : default N=5
# See Also     : fileCount()
#########################################
sub top_files {

    print "Processing top $topN big files...\n";

#start counting here
    my $tstart = gettimeofday();

    find(\&fileCount, @ARGV);

    @sorted = sort {$files{$b}  $files{$a}} keys %files;
    splice @sorted, $topN if @sorted > $topN;

#print scalar %files;

    foreach (@sorted) {
        printf "%6d MB => %s\n", $files{$_}/1048576, $_;
    }

    my $tend = gettimeofday();
    my $elapsed = $tend - $tstart;

#end timing
    printf "%s %4.2f %s", "Elapsed:", $elapsed, "seconds\n";
    print "Total files processed: " . keys(%files) . "\n";
}


#########################################
# Usage        : usage()
# Purpose      : print usage and exit
# Returns      : n/a
# Parameters   : n/a
# Comments     : n/a
# See Also     : n/a
#########################################
sub usage {
    print STDERR /dev/null      # directory usage for /opt
    dur -s1200  /                # top 5 big files older than
                                 #  20 minutes for /
    dur -s86400 /                # top 5 big files older than
                                 #  1 day for /
END
    exit 1;
}


######
# Main
######
$current_time = time();

if ($#ARGV > 0) {
    usage();
} elsif ($dirFlag) {
    dir_usage();
} else { 
    top_files();
}

perl 5.12.4 doesn't like your syntax. – Mike Diehn May 30 '13 at 19:38 — Mike Diehn, May 30 '13 at 19:38

tdr · Answer 7 · 2015-02-14T10:37:55.120

Has mentioned above ncdu tool is a very good tool and probably the best way to go.

But if you are under pressure and just want a quick and dirty way to find out what's going on. Just run the below from / (root)

    [root /]# cd /
    [root /]# du -sm * | sort -nr | head 
    3755    usr    
    151     var   
    109     boot  
    29      etc

    [root /]# cd usr  
    [root usr]# du -sm * | sort -nr | head  
    1618    share  
    1026    lib64  
    572     lib  
    237     bin

    [root usr]# cd share  
    [root share]# du -sm * | sort -nr | head  
    415     locale  
    255     icons  
    185     help  
    143     doc  
    [root share]# du -sm * | sort -nr | head   
    415     locale  
    255     icons   
    185     help   
    [root share]# cd locale  
    [root locale]# du -sm * | sort -nr | head   
    12      uk  
    12      de  
    [root locale]#

And so on and so forth to find and track down what directories and files are taken so much space.

How to find "growing" files inside a linux system

7 Answers7

Linked