Is there a Linux filesystem, perhaps fuse, which gives the directory size as the size of its contents and its subdirs?

Question

If there isn't, how feasible would it be to write one? A filesystem which for each directory keeps the size of its contents recursively and which is kept updated not by re-calculating the size on each change on the filesystem, but for example update the dir size when a file is removed or grows.

Most Linux filesystems permit hard links, so some given file (actually inode) may belong to several directories. Hence asking to which directory some file belongs has no sense, and likewise to ask which files are owned by a directory (that does not means much if a file is hardlinked from an outside directory). — Basile Starynkevitch, Sep 18 '12 at 10:42
Sometimes this feels like it would be useful to store (if not display) this information so that `du` doesn't take so freaking long to work on large directory hierarchies. I can't see it being a big overhead on the file system to update all parent directory stats (asynchronously of course). — Sridhar Sarnobat, Dec 31 '17 at 04:50

score 2 · Answer 1 · answered Sep 18 '12 at 08:00

2

I am not aware of such a file system. From filesystem's point of view a directory is a file.

You can use:

du -s -h <dir>

to display the total size of all the files in the directory.

answered Sep 18 '12 at 08:00

Maxim Egorushkin

131,725
17
180
271

" From filesystem's point of view a directory is a file." Yeah, and a file has size, so why doesnt directory have size of all its contents at least, doesnt have to recurse into each of its subdirs... – rapadura Sep 18 '12 at 10:48
The directory has the size of its contents. Its contents are filenames and metadata though, not the contents of files. – Maxim Egorushkin Sep 18 '12 at 11:55

Piotr Wadas · Accepted Answer · 2012-09-18T08:13:01.420

From the filesystem point of view, size of directory is size of information about its existence, which needs to be saved on the medium physically. Note, that "size" of directory containing files which have 10GB in total, will be actually the same as "size" of empty directory, because information needed to mark its existence will take same storage space. That's why size of files ( sockets, links and other stuff inside ), isn't actually the same as "directory size". Subdirectories can be mounted from various locations, including remote, and recursively mounted. Somewhat directory size is just a human vision, for real files are not "inside" directories physically - a directory is just a mark of container, exactly the same way as special file ( e.g. device file ) is marked a special file. Recounting and updating total directory size depends more on NUMBER of items in it, than sum of their sizes, and modern filesystem can keep hundreds of thousands of files ( if not more ) "in" one directory, even without subdirs, so counting their sizes could be quite heavy task, in comparison with possible profit from having this information. In short, when you execute e.g. "du" ( disk usage ) command, or when you count directory size in windows, actually doing it someway by the kernel with filesystem driver won't be faster - counting is counting.

There are quota systems, which keep and update information about total size of files owned by particular user or groups, they're, however, limited to monitor partitions separately, as for particular partition quota may be enabled or not. Moreover, quota usage gets updated, as you said, when file grows or is removed, and that's why information may be inaccurate - for this reason quota is rebuild from time to time, e.g. with cron job, by scanning all files in all directories "from the scratch", on the partition on which it is enabled.

Also note, that bottleneck of IO operations speed ( including reading information about the files ) is usually speed of the medium itself, then communication bus, and then CPU, while you're considering every filesystem to be fast as RAM FS. RAM FS is probably most trivial files system, virtually kept in RAM, which makes IO operations go very fast. You can build it at module and try to add functionality you've described, you will learn many interesting things :)

FUSE stands for "file system in user space", FS implemented with fuse are usually quite slow. They make sense when functionality in particular case is more important than speed, e.g. you can create a pseudo-filesystem basing on temperature reading from your newly bought e-thermometer you connected to your computer via USB, however they're not speed daemons, you know :)

What kind of speed are we talking about here? Speed when updating/dumping 10 000 files into a dir or speed when streaming a 4.7GB file over the network? :) Would it be possible to implement a simple directory-size-fs using a bash script and cronjob, basically just echo `du somedir` >> somedir/.dir_size, and perhaps later improve it with inotify or such? Hmm. — rapadura, Sep 18 '12 at 08:19
Speed of medium you can read from medium specifications. To create files of particular size for testing, use "dd" command. Note, that speed of remote filesystem access also base on speed of network, which can be overloaded, or just fail etc. If you e.g. can assure that filesystem you can monitor will contain reasonable amount of files in total, you can try to write a daemon or additional pseudo-filesystem, which connects to filesystem and store information you need in separate pseudo-FS structure, just for experiment-I.g. be sure that if this would be profitable, it would had been done already — Piotr Wadas, Sep 18 '12 at 08:21

Is there a Linux filesystem, perhaps fuse, which gives the directory size as the size of its contents and its subdirs?

2 Answers2