2

I recently purchased some new to me r620 servers for a cluster. Mostly they will be doing heavy database transactions, but generally they will have Hyper-V vms doing a variety of work. It was during the database work that I started realizing that the servers were performing much worse than my old r610. Since then I've swapped out controllers, nics, and drives in search of performance comparable to other diskmark tests on similar systems posted online. Mostly my random single threaded performance seems to be horrible. Changing the bios to Performance helped a lot, but I'm still running slow. Enabling/Disabling read, write, and disk cache changes behavior, but does not alter performance radically either way. Every update is applied, and using no read ahead/write back/disk cache enabled for tests (best results). Am I missing something, could my CPU really be that much of a single thread bottleneck, or are my results normal? Thanks for any advice!

System:
R620
Windows Server 2019 Core with Hyper-V - Server 2019 and Ubuntu 18.04 guests
Dual E5-2650v2
128GB (16x8GB PC3L-12800R)
H710p mini mono
5x Intel D3-S4610 960GB SSDs in Raid 5
Intel X540 NIC

Using CrystalMark 3 - 9/4GB:
My system
Read / Write
Seq: 1018 / 1637
512K: 743 / 1158
4K: 19 / 23
4k QD32: 204 / 75

Comparison system - https://www.brentozar.com/archive/2013/08/load-testing-solid-state-drives-raid/
Read / Write
Seq: 1855 / 1912
512K: 1480 / 1419
4K: 34 / 51
4k QD32: 651 / 88

Using CrystalMark 6 - 2/100mb:
my system
Read / Write
Seq Q32T1: 3022 / 3461
4k Q8T8: 335 / 290
4K Q32T1: 210 / 195
4K Q1T1: 32 / 30

Comparison system - https://www.youtube.com/watch?v=i-eCmE5itzM
Read / Write
Seq Q32T1: 554 / 264
4k Q8T8: 314 / 259
4K Q32T1: 316 / 261
4K Q1T1: 33 / 115

Using CrystalMark 6 - 5/1GB:
My system
Read / Write
Seq Q32T1: 2619 / 1957
4k Q8T8: 306 / 132
4K Q32T1: 212 / 116
4K Q1T1: 25 / 27

Comparison system - R610, Hyper-V Core 2012R2 -2008R2 Guests - Dual X5670, 128 GB 1600mhz ram, 4x Samsung 860 Pro 1TB raid 5, h700
Read / Write
Seq Q32T1: 754 / 685
4k Q8T8: 305 / 69
4K Q32T1: 262 / 69
4K Q1T1: 32 / 38

Here are some real world numbers compared to my old R610 system

Export same database table from a local mariadb to a single R620 Mariadb Galera cluster node
R610 - 1.7 million recs/min
R620 - 1.16 million recs/min

Copy folder with thousands of small files from VM to Host
R610 - 23 seconds
R620 - 2 min 40 seconds

Alternatively, large file copies show good performance with R620 beating R610 by about 35%.

Justin M
  • 123
  • 1
  • 6
  • 1
    CrystalMark is not a database benchmark. Edit your question to add which DBMS engine you are using, and real throughput or response time numbers for each. – John Mahowald May 22 '19 at 03:39
  • Thanks for the response John. I use crystalmark since it seems to be a benchmark I can run quickly that shows my R620's drive performance against similar systems I've found online. I have edited my post to show two real world examples against my old R610. Encountering multiple issues like these is what caused me to originally question the performance of the new servers. I've tweaked every network, drive, controller, bios, and hyper-v setting I can find. I've changed Nics, drives, and controllers now, so any advice is greatly appreciated! – Justin M May 22 '19 at 20:17
  • Hi, the copy test from VM to host use different OS ? as such it’s not a good test, as we can’t know if the integration driver work good in your ubuntu vs the 2008r2. Please fire up same OS VM to make a benchmark, thanks! – yagmoth555 May 23 '19 at 00:01
  • The R610 tests are on a Windows 2012R2 host from a 2008R2 vm, and the R620 tests are on Windows 2019 hosts from 2019 vms. I just mentioned Ubuntu to give as much info about the system I'm running as possible. I am going to try 2016 today and see if maybe there is an issue with 2019 that hasn't been uncovered yet. – Justin M May 23 '19 at 16:26
  • It was 2019 after all. If anyone has any ideas why, it would save me a ton of work downgrading and redoing all my hosts and vms. Thanks – Justin M May 24 '19 at 17:44

3 Answers3

1

Server 2019 is the problem after all. I've tried tweaking every setting, changing every piece of hardware, and updating everything to current as of May 2019. In the end the system performed well out of the box with Server 2016.

Justin M
  • 123
  • 1
  • 6
1

I was just wanting to follow-up as I remembered you experienced this issue a while back and the solution was downgrading to WS2016. I'm unsure if you followed this post but: https://www.reddit.com/r/sysadmin/comments/c9a005/server_2019_vm_slow_network_performance_due_to_rsc/

Disabling RSC on the vSwitch may have been the solution to your issue. Unsure, but I just wanted to make sure you were aware.

Best regards,

bloonacho
  • 35
  • 1
  • 1
  • 10
  • So I actually jumped the gun a little on the answer to this question. Once I had simply enabled hyperv, all of a sudden the slow performance returned. Disabling hyperv in boot using "bcdedit /set hypervisorlaunchtype off" was the only way to get performance back. Spectre/Meltdown disabled. Still no answer why, but long story short, I ended up scraping the R620s for R630s. Performance on everything (except transferring 1000s small files) was good out of the box with the R630s. Company I bought from said they had others who said R610/R630 good, but R620 was awful for certain workloads. – Justin M Jul 08 '19 at 19:19
  • Thanks for the RSC tip though, as it did improve some unrelated network performance. Unfortunately it still doesn't fix the small file transfer issue over the network, but that one isn't a show stopper. – Justin M Jul 08 '19 at 20:33
  • Thank you for the update and I appreciate all of the data you've dumped here. Sorry I couldn't be of more assistance but I'm glad it helped at least a bit. – bloonacho Jul 08 '19 at 21:28
0

I'm assuming you've attempted to manually configure your NUMA settings to see SQL as SQL is a NUMA aware application? Just grasping for strings here, but it's a thought.

bloonacho
  • 35
  • 1
  • 1
  • 10
  • Yep, I even balanced network ports across numa nodes, as well as vms, etc. Not sure what broke with Server 2019, but even the Hyper-V VMs with Server 2019 run slower than 2008-2016 VMs on a 2019 host. Which I would think VMs would be a bit more hardware agnostic than the host. Nevertheless, changing the host to Server 2016 showed a further 375% improvement in diskmark. – Justin M May 24 '19 at 19:20