3

If I have a very large contiguous file in the gigabytes that I want to copy, my disk has to allocate all of the necessary space and write a duplicate of every block.

Why can't a copy be "fast" in a sense that it instead copies references to the blocks and writes new blocks only when a change is made?

I understand that this would lead to a decoupling of how much data is on disk (due to block references) with the potential of a disk appearing to contain data in excess of its actual capacity. This could also result in a large amount of space being taken up on writes due to entirely new blocks having to be written when they change from their source block.

There would certainly be unique penalties to such a file system but it sounds like an interesting use case.

Are there any file systems existing today which exploit a similar way of handling data?

Note that I am not an expert on file systems so some of my assumptions may be embarrassingly wrong. I welcome any corrections in the comments.

Zhro
  • 349
  • 2
  • 12

3 Answers3

3

You're referring to a filesystem that is "copy on write" or COW, and the specific feature you're referring to is a reflink file copy.

Instead of copying the file contents, a COW filesystem can make a new file reference the content of another, recording only a mutual delta between the two files. This makes the copying process you speak of nearly instantaneous.

A COW filesystem is also capable of using this same model to deduplicate existing data. For an example, refer to BTRFS with bedup or ZFS.

A penalty of this method is the metadata upkeep required to maintain such file links - COW filesystems tend to consume a fair amount of disk space storing metadata. It also takes a fair amount of CPU time to support this this and other related functionality.

Spooler
  • 7,046
  • 18
  • 29
3

What you are referring to is a reflink. Per the Linux cp man page:

When --reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if --reflink=auto is specified, fall back to a standard copy. Use --reflink=never to ensure a standard copy is performed.

On Linux, this is implemented by the FICLONE ioctl() call:

If a filesystem supports files sharing physical storage between multiple files ("reflink"), this ioctl(2) operation can be used to make some of the data in the src_fd file appear in the dest_fd file by sharing the underlying storage, which is faster than making a separate physical copy of the data. Both files must reside within the same filesystem. If a file write should occur to a shared region, the filesystem must ensure that the changes remain private to the file being written. This behavior is commonly referred to as "copy on write".

Reflinks are supported for BTRFS, and XFS in Linux kernel 4.8:

Linux kernel 4.8 in August 2016 added a new feature, "reverse mapping". This is the foundation for a large set of planned features: snapshots, copy-on-write (COW) data, data deduplication, reflink copies, online data and metadata scrubbing, highly accurate reporting of data loss or bad sectors, and significantly improved reconstruction of damaged or corrupted filesystems. This work required changes to XFS's on-disk format.

cp -z ... and the reflink() function are available on Solaris 11.4 for ZFS. ZFS reflink support will presumably be available in OpenZFS and in ZFSonLinux at some time. See https://github.com/zfsonlinux/zfs/issues/405

Andrew Henle
  • 1,262
  • 9
  • 11
1

Just adding a couple of links to @Spooler's answer:

  • ZFS on Linux project. Very actively developed, works great in my experience. Packages available for many popular Linux distros. (And bundled with Ubuntu.)
  • Aaron Toponce's ZFS pages. Somewhat dated, but an excellent introduction
fmyhr
  • 161
  • 9