2

CSV_REFS performs properly when the diskspd test is run on the disk’s Owner Node. Latency increases 35x for 64k blocks when the test is run on any other node in the 4-node cluster. I can switch the owner node around and run the test on the new owner and I will continue to get good performance. When I run the test from a non disk-owner, the results are poor. CSV_NTFS performs strong regardless of the node in which it runs. I’m considering giving up on CSV_REFS for CSV_NTFS because of this observation.

I’m running Windows Server 2019.

I have considered that RDMA may be the problem, but I can’t find any evidence that I’m having RDMA issues. The logs are clean, test-rdma.ps1 runs fine.

Does anyone have any thoughts as to why this would occur?

W Lucking
  • 81
  • 3
  • 2
    CSV with REFS uses *File System* Redirected Mode for all I/O. That means all the I/O transverses the owner/coordinator node. To confirm this, you can open the same vendor support call that the following person did. Their response from the vendor support engineer is that RDMA is implicitly required. https://community.spiceworks.com/topic/2286588-please-do-not-use-refs-for-cluster-shared-volumes-provided-by-a-san – Greg Askew May 03 '23 at 06:25
  • 2
    "The disk can be provisioned as Resilient File System (ReFS); however, the CSV drive will be in redirected mode meaning write access will be sent to the coordinator node." https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs – Greg Askew May 03 '23 at 06:41
  • 2
    Agree with what @GregAskew said above. File System Redirected mode. If you run the test on any other node than the owner, I/O will travel over the owner node. Hence performacne degradation: https://techcommunity.microsoft.com/t5/failover-clustering/understanding-the-state-of-your-cluster-shared-volumes/ba-p/371889 – Strepsils May 03 '23 at 19:17
  • Thanks, Greg. Focusing me on this aspect of CSV was extremely helpful. Paramount in my conclusions is that CSV and the potential availability of Direct IO in a S2D deployment are not going to lead to performance gains, regardless of whether I couple it with ReFS or NTFS, that might be possible without CSV because of File Redirection Mode. Still, RDMA will be leveraged for whatever benefits it might offer. From my testing today, I agree with the a claim made that NTFS should be used with CSV. CSV and ReFS is painfully slow in S2D. – W Lucking May 03 '23 at 19:35
  • 1
    Came here just to confirm both statements: a) ReFS/CSV is always in redirected mode, unless you’re an owner mode, and b) There’s no single place inside Microsoft official docs confirming this and calling it a “feature” rather than a “bug”. We stick with NTFS unless we can’t migrate the customer off Hyper-V. – RiGiD5 May 20 '23 at 14:13
  • It is also NTFS CSV. Any CSV will use file system redirection. I think it is a requirement of CSV to use file system redirection to have the availability and shared accessibility qualities we seek from it. From my experience and reading ReFS CSV has poor performance, while NTFS CSV has excellent performance on the owner node. ReFS is probably good when not using CSV. – W Lucking May 22 '23 at 13:47

1 Answers1

1

It's by design. ReFS is in redirected mode always. See:

https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs

"Cluster Shared Volumes (CSV) enable multiple nodes in a Windows Server failover cluster or Azure Stack HCI to simultaneously have read-write access to the same LUN (disk) that is provisioned as an NTFS volume. The disk can be provisioned as Resilient File System (ReFS); however, the CSV drive will be in redirected mode meaning write access will be sent to the coordinator node."

BaronSamedi1958
  • 13,676
  • 1
  • 21
  • 53