Cluster size recommendations for ReFS and NTFS
IO amplification refers to the broad set of circumstances where one IO operation triggers other, unintentional IO operations. Though it may appear that only one IO operation occurred, in reality, the file system had to perform multiple IO operations to successfully service the initial IO. This phenomenon can be especially costly when considering the various optimizations that the file system can no longer make:
- When performing a write, the file system could perform this write in memory and flush this write to physical storage when appropriate.
- Certain writes, however, could force the file system to perform additional IO operations, such as reading in data that is already written to a storage device. Reading data from a storage device significantly delays the completion of the original write, as the file system must wait until the appropriate data is retrieved from storage before making the write.
ReFS cluster sizes:
ReFS offers both 4K and 64K clusters. 4K is the default cluster size for ReFS, and we recommend using 4K cluster sizes for most ReFS deployments because it helps reduce costly IO amplification:
- In general, if the cluster size exceeds the size of the IO, certain workflows can trigger unintended IOs to occur. Consider the following scenario where a ReFS volume is formatted with 64K clusters
- By choosing 4K clusters instead of 64K clusters, one can reduce the number of IOs that occur that are smaller than the cluster size, preventing costly IO amplifications from occurring as frequently.
Additionally, 4K cluster sizes offer greater compatibility with Hyper-V IO granularity, so we strongly recommend using 4K cluster sizes with Hyper-V on ReFS. 64K clusters are applicable when working with large, sequential IO, but otherwise, 4K should be the default cluster size.
NTFS cluster sizes:
NTFS offers cluster sizes from 512 to 64K, but in general, we recommend a 4K cluster size on NTFS, as 4K clusters help minimize wasted space when storing small files. We also strongly discourage the usage of cluster sizes smaller than 4K. There are two cases, however, where 64K clusters could be appropriate:
- 4K clusters limit the maximum volume and file size to be 16TB
- 64K cluster sizes can offer increased volume and file capacity, which is relevant if you’re are hosting a large deployment on your NTFS volume, such as hosting VHDs or a SQL deployment.
- NTFS has a fragmentation limit, and larger cluster sizes can help reduce the likelihood of reaching this limit
- Because NTFS is backward compatible, it must use internal structures that weren’t optimized for modern storage demands. Thus, the metadata in NTFS prevents any file from having more than ~1.5 million extents.
- One can, however, use the “format /L” option to increase the fragmentation limit to ~6 million.
- 64K cluster deployments are less susceptible to this fragmentation limit, so 64K clusters are a better option if the NTFS fragmentation limit is an issue. (Data deduplication, sparse files, and SQL deployments can cause a high degree of fragmentation.)
- NTFS compression only works with 4K clusters, so using 64K clusters isn’t suitable when using NTFS compression. Consider increasing the fragmentation limit instead, as described in the previous bullets.
While a 4K cluster size is the default setting for NTFS, there are many scenarios where 64K cluster sizes make sense, such as: Hyper-V, SQL, deduplication, or when most of the files on a volume are large.