2

I'm working with high end hardware, however I'm hitting cpu bottlenecks in all situations when attempting to move large amounts of data.

Specifically, I'm moving large virtual machine image (VHD) files of 2TB between two Ubuntu hosts.

My latest attempt took 200 minutes to transfer 2TB. Resulting in a throughput of about 170MB/sec transfer.

I'm trying techniques such as netcat, and scp with basic arcfour cipher.

The hardware on each end is 6 x enterprise grade SSDs in RAID 10, on a hardware raid controller. 256GB memory, and Xeon V4 CPUs. Network is 20Gbe ( 2 x 10Gbe LACP ).

In all cases, the network and disk i/o has plenty of capacity left, the bottleneck is pegging 1 CPU core to 100% constantly.

I've performed basic benchmarks using various methods, as follows:

30GB test file transfer

scp: real 5m1.970s

nc: real 2m41.933s

nc & pigz: real 1m24.139s

However, because I dd'd an empty file for testing, I don't believe that pigz was having to work too hard. When I attempted pigz on a production VHD file, pigz hit 1200% CPU load, I believe this started to become the bottleneck. Therefor my fastest time was set by nc on it's own.

nc hits 100% CPU on each end, I'm assuming just processing the i/o from the disk to the network.

I did think about splitting the file into chunks and running multiple nc to make use of more cores, however, someone else may have a better suggestion.

epea
  • 406
  • 1
  • 9
  • 19
  • 1
    Would you add details such as network infrastructure and hardware? And what about RAM amount and its usage? Was scp run with or without compression? Pigz is hitting all cores because it supports multiple threads but I don’t think scp does. Therefore it sooner has data available to be sent through nc and it achieves much faster throughput . – Marco Dec 28 '18 at 11:16
  • 1
    Also test your network bandwidth with `iperf`/`iperf3` and post the results. – Thomas Dec 28 '18 at 13:23
  • Apologies, should have had the network in from the start. It's 20Gbe (2 x 10Gbe LACP), network bandwidth isn't the issue as I can see nc hitting 100% cpu and staying there. Just tested further with pigz, it's using about 800% on sending, however I've just read Pigz can't offer multi threaded decompression, so it's the receiving end which is holding it up on a single thread. Also tested netcat with udp mode, no difference vs tcp on that, also tried udp-sender and udp-receiver, single theaded udp sender maxes out 1 core. – epea Dec 28 '18 at 19:42

3 Answers3

1

A few thing to try:

  • use a program that uses sendfile (e.g. apache)
  • tune the Linux network stack and NIC
  • enable a larger MTU
  • enable NIC offloading
  • use a better performing filesystem (xfs or zfs)

The ESnet Fasterdata Knowledge Base is a great resource for optimizing moving data across fast networks.

Mark Wagner
  • 18,019
  • 2
  • 32
  • 47
0

It's been a while since I posted this and it's getting some views, in the end I used bbcp: https://github.com/eeertekin/bbcp to saturate the network, it works extremely well.

epea
  • 406
  • 1
  • 9
  • 19
0

Are your endpoints physically near each other? Maybe consider a different network medium which is designed for moving buttloads of data around. CPU processing can be offloaded to an adapter card and your ethernet wont be saturated for minutes at a time.

Below is a (low-end) Infiniband setup that cost around $500 bucks from Ebay parts (Mellanox IS5022 switch, 2 CX353A QDR cards (maybe FDR, don't remember) and (new) cables). I ran dd from a hypervisor with 20+ VMs running on it so there is a fair amount of I/O delay in it. The SSD transfer (an iSCSI mount) is still noteworthy however.

To a SATA array (RAID 10):

# time dd if=/dev/zero of=foo.bin bs=1M count=30000
30000+0 records in
30000+0 records out
31457280000 bytes (31 GB, 29 GiB) copied, 106.652 s, 295 MB/s

real    1m52.795s
user    0m0.022s
sys     0m12.531s

And to an SSD array

# time dd if=/dev/zero of=foo.bin bs=1M count=30000
30000+0 records in
30000+0 records out
31457280000 bytes (31 GB, 29 GiB) copied, 19.1353 s, 1.6 GB/s

real    0m19.137s
user    0m0.020s
sys     0m18.782s
Server Fault
  • 3,714
  • 12
  • 54
  • 89
  • Yes it's in the same rack. Network is 10Gbe (should have mentioned that). I don't have the ability to get infiniband into it due to lack of pcie space. That 1.6GB/s is impressive though. Perhaps an NFS mount may be faster, i'm expecting i should get close to saturating the link, if I can just around the CPU issue. – epea Dec 28 '18 at 19:01