8

I'm looking for a file type for storing archives of systems that have been decomissioned. At the moment, we primarily use tar.gz, but finding and extracting just a few files from a 200GB tar.gz archive is unwieldy, since tar.gz doesn't support any sort of random-access read provision. (And before you get the idea, mounting a tgz using FUSE doen't make it better.)

Here's what we've found so far -- I'd like to know what other options there are:

  • tar.gz -- poor random-access read
  • zip -- lacks support for some advanced filesystem features (e.g: hard links, xattrs)
  • squashfs -- takes an extremely long time to create a large archive (many hours) and poor userspace tools.

I'm trying to think of a simple way of creating a full-featured filesystem image into as small a space as possible -- ext2 in a cloop image, but it doesn't seem like a particularly user-friendly solution.

Presumably this problem has been solved before -- are there any options I've missed?

tylerl
  • 30,197
  • 13
  • 80
  • 113

5 Answers5

9

Mksquashfs is a highly parallelised program, and makes use of all available cores to maximise performance. If you're seeing very large build times then you either have a lot of duplicate files, or the machine is running short of memory and thrashing.

To investigate performance, you can firstly

Use -no-duplicates option on Mkssquashfs i,e.

mksquashfs xxx xxx.sqsh -no-duplicates

Duplicate checking is a slow operation and it has to be done sequentially, and on file sets with a lot of duplicates this becomes a bottleneck on an otherwise parallelised program.

Check memory usage/free memory while Mksquashfs is running, if the system is trashing, very low performance will occur. Investigate the -read-queue, -write-queue and -fragment-queue options to control how much data Mksquashfs caches at run-time.

Tar and zip are not parallelised and use only one core, and so it is difficult to believe your complaint about Mksquashfs compression performance.

Also I have never seen any other reports that the userspace programs are "poor", Mksquashfs and Unsquashfs have an advanced set of options which allow very fine control over the compression process, and to allow users to select which files are compressed - and these options are considerably in advance of programs like tar.

Unless you can give concrete examples of why the tools are poor, I will put this down to the usual case of the workman blaming the tools, whereas the real problem is elsewhere.

As I said previously, your system is probably thrashing and hence performing badly. By default Mksquashfs uses all available cores, and a minimum of 600 Mbytes of RAM (rising to 2 GBytes or more on large filesystems). This is for performance as caching data in memory reduces disk I/O. This "out of the box" behaviour is good for typical users which have large amounts of memory, and an otherwise idle system. This is what the majority of users want, a Mksquashfs which "maxes out" the system to achieve as fast as possible filesystem creation.

It is not good for systems with low RAM, or for systems with active processes consuming a large amount of the available CPU, and/or memory. You will simply get resource contention as each process contends for the available CPU and RAM. This is not a fault of Mksquashfs, but of the user.

The Mksquashfs -processor option is there to limit the number of processors Mksquashfs uses, the -read-queue, -write-queue and -fragment-queue options are there to control how much RAM is used by Mksquashfs.

  • 1
    Hi Phillip -- I'm a bit flattered that you'd create an account here just to respond to my question. Thanks for the performance tips. SquashFS is my current favorite option, but I miss the flexibility i have with tar. E.g. the `--one-file-system` flag, or extracting only specific files (when you can't mount the FS). I've though about contributing some userspace code to help make squashfs more comparable to tar in features, but I don't think I'll ever find the time... nor am I sure you'd even want them. – tylerl May 29 '11 at 06:46
  • 1
    To clarify a bit -- what I meant by "poor" userspace tools was that the tools were poorly applicable for this type of use (general-purpose archiving). They're quite excellent for creating linux boot filesystem images... which is what most people want to use squashfs for anyway. – tylerl May 29 '11 at 06:55
  • See here too, but squashfs is read only: https://unix.stackexchange.com/questions/80305/mounting-a-squashfs-filesystem-in-read-write – user1742529 Dec 25 '19 at 14:15
2

virt-sparsify can be used to sparsify and (through qemu's qcow2 gzip support) compress almost any linux filesystem or disk image. The resulting images can be mounted in a VM, or on the host through guestmount.

There's a new ndbkit xz plugin that can be used for higher compression, which still keeps good random-access performance (as long as you ask xz/pixz to reset compression on block boundaries).

Gabriel
  • 1,262
  • 12
  • 12
1

ZFS has pretty decent compression capabilities, if memory serves. That said, I've never actually used it. :-)

Abe Voelker
  • 30,124
  • 14
  • 81
  • 98
1

The dar (disk archiver) program is an open source program that supports compression (on a per-file basis), and includes an index for fast seeking to a specific file. It is widely available on a variety of systems. From the FAQ, xattrs and hard links are supported.

Many backup/copy tools do not take care of hard linked inode (hard linked plain files, named pipes, char devices, block devices, symlinks)... dar does, Many backup/copy tools do not take care of sparse files... dar does, Many backup/copy tools do not take care of Extended Attributes... dar does, Many backup/copy tools do not take care of Posix ACL (Linux)... dar does, Many backup/copy tools do not take care of file forks (MacOS X)... dar does, Many backup/copy tools do not take any precautions while working on a live system... dar does.

Brian Minton
  • 3,377
  • 3
  • 35
  • 41
-1

As this is Stack Overflow, I assume you are looking for library/code. I think you can check our SolFS virtual file system then. It doesn't support hardlinks, but alternate streams are supported (for xattr) and tags are supported (for unix attributes). Next, symlinks are supported you can convert hardlinks to symlinks when performing the archive.

Eugene Mayevski 'Callback
  • 45,135
  • 8
  • 71
  • 121