3

When creating a new NTFS partition one is asked to choose a cluster size. The default size is 4k but one can choose a smaller sizes, too. 512 bytes is the smallest.

The smaller size leads to wasted space reduction. Each file occupies 1 or more clusters depending on file size. If the file size can be divided by the cluster size then no space is wasted by default. Otherwise only some part of the last cluster will store file data and remaining space will be wasted. On average it's about half of the cluster size per file. Considering that a typical partition stores tens of thousands of files 265 vs. 2k per file sounds like a big deal for me.

I always choose 512 bytes to reduce the amount of wasted space but I believe that there might be some negative effects of using smaller clusters. Otherwise 512 bytes would be used by default. What are those drawbacks?

8 Answers8

2

Smaller cluser size means that a file will be distributed between more clusters (obvious). This means potentially more fragmentation and possibly more lookups to find the clusters. It is the usual speed vs size optimisation. As the hard disks are cheap, I would go for larger cluster sizes, but anyway, you will probably not see that much difference ...

Guillaume
  • 1,063
  • 5
  • 12
  • 24
  • 1
    An additional point to this is that NTFS only has so much room for the directory data reserved. With 512 clusters versus 4k, you may wind up needing 8 times the records to track larger files that grow in small increments. When working with 1TB and larger disks and growing files (logs, DBs), you can exhaust that resource and wind up not being able to create new files! I've had to reformat a 4TB partition since regular defraggers can't fix NTFS meta-data issues. – Mark Jun 12 '15 at 18:03
2

NTFS is extent based (like xfs, ext4 and more on the *nix side) so the slowdown you get from non-extent based filesystems (eg fat, ext3) is reduced.

There's still an overhead though, and where it starts to hurt is fragmentation. Windows is HORRIBLE about fragmentation, try running defraggler to see how even sequentially written files (eg from program installation) can end up in 30+ fragments.

I'd generally suggest 4k as a good size, although if a drive is to be used for large media files 64k or larger can help.

http://www.defraggler.com/

LapTop006
  • 6,496
  • 20
  • 26
1

One reason to go for a 512 byte sector size is if you are planning to use Microsoft Windows Backup on a server. Amazingly if you use Windows 2008 (I do not know whether this has been fixed in Windows 2012 or later versions) but the backup will fail if you use the default 4k sector size on an off-the-shelf usb external hard drive! I recently purchased a new Seagate drive after speaking with Seagate's tech support to confirm that this was possible and I could reformat the 4k sector size back to 512 bytes and my backup worked. It is strange that neither Seagate's nor Microsoft's support website point this out.

The earlier external drives do not allow you to reformat the drive with a sector size of 512 bytes; the minimum is 4k.

Reuven
  • 11
  • 1
  • This is a reason why they should use 512 but not a reason why they shouldn't. It isn't an answer to the question, but just part of the discussion. It might be better to write this as a comment to the question rather than clutter the answer section with non-answers. – Mark Jun 12 '15 at 17:57
0

The following trades are involved in this decision:

Space Wastage

The allocation size or cluster size is the smallest logical unit of space in a volume. Meaning even if your file is 1KB in size, it will still consume 1MB, if the cluster size is 1MB, obviously the 1023KB of space will then be wasted.

Essentially, the smaller the unit, the lower the wastage.

This becomes quite inconsequential, if the total space goes beyond a certain level. For normal PCs that may be just above 1TB and for servers it may vary wildly from application to application.

Speed

The speed is directly proportional to the allocation size. The bigger it is, the less indexing needs to be done, kept and traversed.

Keep in mind though, small increases in cluster size do not present meaningfully discernible benefits. Like if you went from 512b to 4KB, it won't really show up but going to 64KB will net you visible boosts. Similarly going to 512KB or higher will net you substantial benefits.

Fragmentation Susceptibility

This is inversely proportional to cluster size and with obvious basis. Smaller units equal higher fragmentation susceptibility as more of it can be accommodated in empty spaces. The bigger the units, the less fragments possible per file.

Maximum Size of Files and Volumes

The MFT has a limited address space for directory and file records. Smaller cluster sizes, obviously generate more index entries and thus have a profound affect on the maximum size of files and volumes.

Bigger units allow to access more storage with the same limits. That is exactly why NTFS cluster size options went from 64KB to the new 2MB maximum in windows versions 1709+. 2MB clusters allow maximum 2PB file and volume sizes!

Umair Ahmed
  • 101
  • 2
0

There is no reason to not use 512 byte cluster size especially on a drive for backup data. You save space and there is zero impact on speed unlike what a poster before said. Drive performance is NOT directly proportional to allocation size. I have tested every cluster size, they all result in the same transfer speed the only difference is higher cluster sizes waste space. The idea for large clusters is for files you may want to write to later on (text documents, config files) In case of a 512byte cluster size, if you try to write more bytes to it than 512bytes it will have to allocated new space which may not be near the current sectors of the drive (it will lead to fragmentation) on a large cluster setting this space is already reserved near the file's originating sectors so it fills in that space instead. This reduces fragmentation but in all honesty, I can't see why not just defragment the hard drive regularly. Windows already does this in the background for you. Again, there is no reason not to use 512 bytes.

Corky
  • 101
0

I wouldn't change from the default cluster size, unless you really know what you are doing.

Yes, smaller cluster sizes do mean less slack space and thus less wasted space. However, smaller cluster sizes also mean that less data is being transferred from the disk in each read operation and thus you may get a drop in read performance. It is also likely that there will be less fragmentation with a larger cluster size, as the data is more likely to be stored either in one cluster, or contiguously.

Gavin McTaggart
  • 1,846
  • 16
  • 14
0

According to the MySQL manual, when using mysql with innodb engine and compressed row format on Windows, it is recommended to use cluster size less than 4kb in order to benefit from compressed row format at all:

The default NTFS cluster size is 4K, for which the compression unit size is 64K. This means that page compression has no benefit for an out-of-the box Windows NTFS configuration, as the maximum innodb_page_size is also 64K. For page compression to work on Windows, the file system must be created with a cluster size smaller than 4K, and the innodb_page_size must be at least twice the size of the compression unit. For example, for page compression to work on Windows, you could build the file system with a cluster size of 512 Bytes (which has a compression unit of 8KB) and initialize InnoDB with an innodb_page_size value of 16K or greater.

Andrew Schulman
  • 8,811
  • 21
  • 32
  • 47
-1

Probably 11 years ago, when the question was asked, that wasn't so obvious, but now it surely is. The world turned to 4k clusters. The reasons why cluster size must be exactly 4k or a multiple of it include:

  • x86 page size is 4k. A processor sees memory as a set of 4k blocks. So when you use 4k clusters, you have 1-to-1 mapping for memory mapped files, including swap, which improves much more than you might think of; virtually everything.

Recent Windows adopted clusters up to 2M. This actually reflects x86 again. The said 4k was true for 80384, 80486, but around Pentium Pro we've got an option to use pages of 2M size.

  • new hard disks have 4k blocks, a so called "advanced format". This was true even 11 years ago. When you write 512 byte block, a drive must read a whole 4k block, update a part in it and write 4k back, a very slow process; if you write 4k, you are free from this and work with maximum efficiency. If you encounter such device, never ever use clusters which aren't a multiple of 4k! Equally importaint is to be sure everything is aligned on disk so file systems start on 4k boundary

  • for SSDs this even more important. The same applies, but you can only write to a "free" page; if you want to overwrite part of it, you need to read it, update in memory, then erase a whole block and then you may write. This is slow and every manufacturer out there does all sorts of fancy tricks to hide it from us; despite this, the process is still there. In addition a block wear needs to be managed. And the smaller block we update the worse is efficiency of these internal management algorithms. To improve situation even a new technology appeared, which is called "zoned block device". So, in conclusion, small block is again bad here.

  • A well written software that needs to use many small pieces of data knows that and usually don't splits them into files. It often stores all of them together. So well written software won't lose much space in tail blocks.

  • Some file systems have a feature of "tail packing", it is when a file system takes these "less than block" parts of several files and packs them together into a signle block. This helps to waste much less space, but the performance is worse. ReiserFS is a well-known example of such file system. Unfortunately, NTFS is't capable of doing this, but

  • NTFS packs small files directly into MFT records, if they fit. This means, for small files it doesn't even allocate extents, so it doesn't waste block in them.

  • Storage devices are HUGE nowadays. The software seems to catch that up, so files also getting larger. This means less blocks in percentage are tails, so less possibility to waste. If you have files of around 200k size, you only waste 1% on average, no more than 2% in the worst case. Filesystem metadata also increases in size. There are even suggestions (see page 132) to increase block size and accept increased waste.

Nikita Kipriyanov
  • 10,947
  • 2
  • 24
  • 45
  • ntfs cluster size (aka allocation unit size) does not have any correlation with a read\write block size. it is complete misunderstanding. – Alex Aug 14 '21 at 20:19