Failover Cluster file server performance issue with Windows Server 2016

Question

I've run into an interesting file server transfer speed performance issue with a failover cluster I recently configured in Server 2016. The specific issue is that when I access the file share from the clustered storage path (e.g. \\store01\share01) file transfer speed (writes in particular, it seems) is way, way slower than when I access it via the local path on the current owner node (e.g. \\srv04\e$\Shares\Share01).

For example, I copied 499 .txt files (totaling 26.07 MB) using Robocopy:

\\srv04\e$\Shares\Share01: 0:0:03 - 635 MB/min
\\store01\share01: 0:02:20 - 11.286 MB/min

This is an issue regardless of the current owner node or where the data is transferred from. Although I didn't follow it at the time, I more or less installed and configured the service as indicated in this guide. I've tried messing with a few settings, but they're all back to default (as far as I know). I've looked around a bit and haven't found anything specifically mentioning a huge performance issue with using a failover cluster, so I've been doing some random research without much to show for it.

Few things about the configuration that might be relevant:

The cluster currently has two nodes. Both are running Server 2016 and both have two Nic Teams (configured in Windows, Switch Independent) consisting of two 1Gbit connections each.
The actual storage being used is a Synology that both machines are accessing via iSCSI, configured using these instructions.
Everything else seems to work fine, in the way that simulating a failover works and the other node takes over a few seconds later.

I'm guessing this is one of those "obvious to anybody who knows more than I do" sort of situations. Or Maybe I'm just hoping for that. Either way, I appreciate any guidance! I tried to keep it short, so please let me know if you need any other information.

Thanks in advance.

score 4 · Answer 1 · answered Jun 29 '17 at 00:29

4

Your first issue is NICs teamed for iSCSI. You never do that unless your both target and initiator support multiple connections per session and in your case neither of them does.

https://www.starwindsoftware.com/blog/lacp-vs-mpio-on-windows-platform-which-one-is-better-in-terms-of-redundancy-and-speed-in-this-case-2

http://scst.sourceforge.net/mc_s.html

Solution: you have to un-team your NICs and use MPIO.

Your second issue is Synology itself. It's not what you use for primary storage, it's backup unit at best.

Solution: you copy your content to local disks and use Synology as backup repository or whatever.

answered Jun 29 '17 at 00:29

BaronSamedi1958

13,676
1
21
53

I know that the Synology, at least, supports "Allow multiple sessions from one or more iSCSI Initiators:". There's a checkbox and everything. Really beside the point, though. Thank you for your suggestion. And I'm just working with the resources that I have. Can you elaborate a bit on why my current configuration would function just fine (performance wise) when being accessed via one path vs. the other? – Perilous Jun 29 '17 at 03:14
3

I didn't say you need just ONE path, I told you need to get rid of LACP because it doesn't work with your target and initiator AS IS. I gave links you could read and understand WHY. – BaronSamedi1958 Jun 29 '17 at 05:40
3

"Allow multiple sessions from one or more iSCSI Initiators:" has no relationship with your current issue, it's something target should do to make distributed locks being essential for Hyper-V, VMware etc functional. – BaronSamedi1958 Jun 29 '17 at 05:41
Sorry, I should have clarified. I meant one file path (\\store01\share01 vs.\\srv04\e$\Shares\Share01). Either way, I'll break up the nic teaming and try mpio and see if there's any improvement. Thank you. – Perilous Jun 29 '17 at 17:23

score 0 · Accepted Answer · answered Jul 03 '17 at 21:25

After removing the NIC teaming and placing the connections on the Synology/Server on a different subnet for good measure, I still saw no performance improvement.

I did finally come across the solution, though. It turns out Continuous Availability being on (the default) for the share was the culprit. There is documentation that says it may cause a "slight" performance penalty (like here) due to bypassing the write cache, it seems in some cases that "slight" performance penalty is actually "gigantic." Here's an article that goes into a pretty helpful background about Continuous Availability and when you might want to use it (to summarize, you might want to turn it off if your share is configured for "General Use File Server" and you're concerned with performance).

So, long story short, I disabled Continuous Availability on the share being used by the cluster, restarted both servers for good measure, and the performance issue was resolved. While I'd prefer to have it on to guarantee data integrity during a failover event, those are going to be so few and far between in my environment that there's no question the performance penalty wasn't worth it.

CA is non-cached I/O by design so yes, that's normal. https://blogs.technet.microsoft.com/filecab/2016/03/15/offline-files-and-continuous-availability-the-monstrous-union-you-should-not-consecrate/ "You can enable CA on non-SOFS shares in a cluster, and you can use end-user applications like Word with them, but for a variety of reasons, it’s not something we recommend. " — BaronSamedi1958, Jul 03 '17 at 21:44

Failover Cluster file server performance issue with Windows Server 2016

2 Answers2