7

I am using git-annex, an extension to the DVCS git, which is designed for handling large files. It makes heavy use of symlinks. The actual large files are moved to the .git/annex directory and the original files are symlinked to there.

I am running out of disk space, and need to clear up, and see what's using all my space. Usually I'd use a disk usage tool like ncdu, Baobab or Filelight. However they treat the symlink as essentially empty, and only count the file that it is pointing to as using any space. Which means when I use git-annex, it shows no space used in the main directories and lots of space used in the .git/annex directory. This is not helpful.

Is there any (graphical or ncurses) based disk usage programme for linux (apt-get installable would be easie that is capable (through options or not) of counting a symlink as using up the space that the original file uses up? Many have options for different behaviour for hard links, so makes sense that some should h

(I know counting symlinks as using space has flaws, like counting the space space twice, broken symlinks, etc. But that's OK for my purposes)

Amandasaurus
  • 31,471
  • 65
  • 192
  • 253
  • Maybe it's doable with fsview http://manpages.ubuntu.com/manpages/lucid/man1/fsview.1.html Also, take a look here: http://superuser.com/questions/9847/linux-utility-for-finding-the-largest-files-directories – Mihai Todor Aug 01 '12 at 18:40

4 Answers4

6

GNU du has the --dereference option, which dereferences symbolic links when computing disk usage. However, du refuses to count the same space twice, which may be a deal-breaker in your situation:

% mkdir foo bar baz
% dd if=/dev/zero of=foo/test bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 0.0176239 s, 581 MB/s
% (cd bar; ln -s ../foo/test)
% (cd baz; ln -s ../foo/test)
% du -hc bar baz
4.0K    bar
4.0K    baz
8.0K    total
% du -hc --dereference bar baz
9.8M    bar
4.0K    baz
9.8M    total

If you don't have multiple symlinks to the same target, though, I think --dereference does what you want.

2

nowadays, git-annex has its own solutions for this problem. you can use:

git annex info --fast *

...to get actual disk usage (and more) from the files directly from git-annex. it can also operate on remote repositories, which is very useful:

git annex info --fast --not --in here .

... would give you the amount of data that is not in the current repository for example.

i have also used ncdu with this small patch with good results.

the upstream forum discussing this is "du" equivalent on an annex? and has more suggestions, like du -L, gadu and sizes that were mentionned in other answers here.

anarcat
  • 752
  • 1
  • 9
  • 18
1

git-annex has a list of related software including some git-annex aware disk usage tools - gadu and sizes.

Andrew
  • 8,002
  • 3
  • 36
  • 44
1

Is there any (graphical or ncurses) based disk usage programme for linux (apt-get installable would be easie that is capable (through options or not) of counting a symlink as using up the space that the original file uses up?

TL;DR: du -akL mydirectory | xdiskusage -aq

Long answer: combine two powerful combinable programs

I also use git-annex and have the same need.

Reference tool to get disk usage: GNU du

GNU du like most GNU tools has a lot of options, including:

‘-L’ ‘--dereference’

Dereference symbolic links (show the disk space used by the file or directory that the link points to instead of the space used by the link).

Reference tool to interactively explore and zoom into a disk usage tree: xdiskusage

Besides, there's an excellent, lightweight disk usage representation tool named xdiskusage.

You can use it fully graphically: choose folder, or choose full filesystem to include free space representation. You can click, use arrows and Enter key to zoom into the tree display, hide some subtrees. It's very practical, simple, quick, even on remote display.

Combine them and profit!

It has the nice property that you can also feed it the result of du invocation similar to du -ak.

So, you can do:

du -akL mydirectory | xdiskusage

I happen to always use this variant where -a means show all files (not only directories), and -q make everything much faster by removing the progress slider:

du -akL mydirectory | xdiskusage -aq

Image from http://xdiskusage.sourceforge.net/ by Bill Spitzak.

xdiskusage display sample

apt-get ?

apt-get installable

On Debian and derivatives including Ubuntu:

sudo apt-get install coreutils xdiskusage

(You most certainly already have coreutils installed.)