1

I'm working on a piece of software that writes data to NTFS sparse files and I can't find any documentation on what the limits of an NTFS sparse file are.

I have seen reference to the fact that limitations exist, but not to what those limitations are.

Specifically I am interested in any limitations around the maximum file size of the sparse file and the number of allocated ranges within the file.

ChrisPatrick
  • 984
  • 7
  • 19
  • [2^44 bytes - 64 KB](https://technet.microsoft.com/en-us/library/cc938432.aspx). You **could** have used google, you know ... – specializt Nov 23 '16 at 09:57
  • @specializt That may well be true for a *normal* file, but my understanding is that a *sparse* file is different because it relies on a separate allocation range table which has it's own limitations. – ChrisPatrick Nov 23 '16 at 11:36
  • .... no. I think you might have misunderstood quite a few concepts - sparse files are no different, they simply may not contain any actual data, you can allocate up to the NTFS filesize limit, there are no *"seperate allocation range tables"*, that'd be horrible in regards of performance – specializt Nov 23 '16 at 14:49
  • As seen [here](https://msdn.microsoft.com/en-us/library/windows/desktop/aa364582(v=vs.85).aspx), a sparse file contains a series of allocated ranges. If you ask for data outside any of those ranges then the OS will just return zeroes. These allocated ranges are tracked separately. And from the last paragraph: "Large and highly fragmented sparse files can exceed the NTFS limitation on disk extents before available space is used." [This article](http://www.flexhex.com/docs/articles/sparse-files.phtml) also implies the same in the section "Can They Really Be That Large?" – ChrisPatrick Nov 23 '16 at 14:57
  • Maybe I have worded my question poorly, but the rest of that sentence is " You can create a largest possible 16 terabyte sparse file if and only if it consists of a single sparse zero area, no data at all." This implies that if I am writing a sparse file with actual data in it there will be a point (before I reach 16TB) at which I can no longer write data to that file. To my mind, that is a limitation of the size of sparse file I can have and also the thing I can find no documentation of. – ChrisPatrick Nov 23 '16 at 15:38
  • this very limit is mentioned in my very first comment. – specializt Nov 23 '16 at 15:43
  • 2^44 bytes = 16TB. That is apparently only possible in a sparse file if there is no data in the file. – ChrisPatrick Nov 23 '16 at 15:46
  • you might want to read my first comment again. And again. Probably even **click on the link** – specializt Nov 23 '16 at 15:46
  • OK, maybe I am being dense. Your comment (and yes I have read the linked article) says the the maximum file size on NTFS is 2^44 bytes - 64 KB. Great. However, the second article I linked states that in practice "It is probably safe to assume that you always can create a 300-500 gigabyte large sparse file, but any attempt to create a larger file might result in the Disk full error, no matter how little real data have been written." That implies, regardless of the implementation (which MS have hardly been forthcoming on), that sparse files have some additional limit. – ChrisPatrick Nov 23 '16 at 16:01
  • that little article is just some .... article on some random, private website. Just some guy making stuff up. Absolutely **never** trust any non-official statement. There are no additional limits **but** environment circumstances may very well prevent you from creating these enormous files - the windows kernel is a very complex and dynamic beast, it may reassign resources at any time so if your system is under load you could encounter **temporary limits** - but you will not be encountering these if you're programming in a **sane fashion**. Ever. Its as simple as that. Use defensive programming – specializt Nov 23 '16 at 16:06
  • That's precisely why I asked the question. – ChrisPatrick Nov 23 '16 at 16:14
  • 1
    well im glad i could help. – specializt Nov 23 '16 at 16:21

2 Answers2

2

In the documentation on the error code you will get has some hints as to the limits:

Specifically:

If you plan to use very large files (more than 500 GB) that have many in-place chunks, you should format the volume by using the "/L" option to accommodate large-size file records. By default, the volume is formatted to use small-size file records.

The documentation of the "/L" option has the approximate max number of extents per NTFS file:

Enables support for large file record segments (FRS). This is needed to increase the number of extents allowed per file on the volume. For large FRS records, the limit increases from about 1.5 million extents to about 6 million extents.

UrOni
  • 431
  • 4
  • 9
1

The answer will depend on just how sparse the file is, as well as on the cluster size of the hard disk.

NTFS, like most other filesystems, considers a file to be an ordered list of disk clusters. That "ordered list" is a physical data structure in the filesystem, and occupies disk space. As the number of records in this list grows, the filesystem must assign more physical blocks to hold it. However, the number of blocks that it can add is ultimately limited (see references).

So, let's assume that you have a 1TB disk, which by default has a 4kb cluster size, and you write a 512GB file.

  • If you write that file sequentially, the system will make an attempt to allocate contiguous blocks, and there will be a relatively small number of entries in the list (fragments in the file).
  • If you write that file randomly, you will create a sparse file; each time you write a block that hasn't been written before, you must allocate a cluster for that block. Since you're writing randomly, the OS probably won't be able to allocate contiguous clusters, so you'll have more entries in the list. Your 512GB file could require 134,217,728 fragments (assuming I've done the math correctly).

I don't know if that number of fragments would be beyond the capacity of the NTFS management structures. But let's assume it is. You might still be able to manage that file if you used a volume where the cluster size is 64k (resulting in 8,388,608 fragments).

Aside from the possibility of running out of fragments, heavily fragmented files will be less efficient because access to any particular block requires walking through the list of fragments to find that block (I'll assume that some form of binary search is involved, but it's still worse than examining one fragment that holds all blocks). Moreover, when using magnetic media, the overall disk access will be sub-optimal because closely numbered blocks may be at widely different locations on the drive. Better, in my opinion, is to pre-allocate and sequentially init the entire file (unless, of course, you're not planning to store much data in it).

References (both from Microsoft):

  • How NTFS Works - an overview of the structures in the NTFS filesystem.
  • The Four Stages of NTFS File Growth - Post by a member of Microsoft's support team that details how the allocation nodes for a file grow over time. See also the followup post that shows a partial work-around that increases the number of allocation records.
kdgregory
  • 38,754
  • 10
  • 77
  • 102