1
du -hc --max-depth=1 /home/back/tvppclientarea/ | sort -k2

To show the size of backup directories that use rsync and hard links. The command lists each directory and shows the amount added to the previous directory, i.e. how big each backup was.

For reference the rsync command was

rsync --archive --itemize-changes --human-readable --stats --delete --link-dest=/home/back/tvppclientarea/1586563849_Sat_11Apr2020_0110 root@domain.com:/home/tvppclientarea/. /home/back/tvppclientarea/1586565194_Sat_11Apr2020_0133/.

Think is the directory is 1.5TB and it takes a long time to run, minutes. Was wondering if there was a way of speeding this up. I came across a command ncdu which I think may work (it does cacheing so the second time you run it is quicker) but can't find how to replace my command with it.

Ben Edwards
  • 341
  • 4
  • 13

1 Answers1

1

Later versions of ncdu attempt to only count space use of hard links once, see the man page for details. This is different from du behavior.

If you want to count hard links only once like that, there is no avoiding scanning the entire backup tree. Or at least, the destination and the --link-dest of one backup. Necessary to find references to the same inode.

ncdu -o output files save every file of one report. Not an incremental cache, all you can do is load the entire thing into the ncdu UI. So it will still take minutes to run

ncdu -1xo- /home/back/tvppclientarea/ | gzip > ncdu_tvppclientarea_$(date "+%Y-%m-%d").gz

but loading that report again later is much faster.

zcat ncdu_tvppclientarea_$(date "+%Y-%m-%d").gz | ncdu -f-

Could save a ncdu report from each individual backup, not the entire tree. So the target directory, and the --link-dest where the hard links are. Then comparing the size of the last backups is a matter of finding multiple report files and running ncdu -f on each.


File system metadata is many small IOs. Order of magnitude one per file when iterating over them like this. Improving the IOPS of the storage system may make this faster. Perhaps adding a caching tier of fast storage.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • Thanks, saving the last run got me thinking. What I was ultimately trying to do is track how each backup grows so if I just do a du at the end of each backup and save it away I can easily work thins out and it will be a lot quicker. – Ben Edwards May 03 '20 at 12:42