I'm trying to transfer about 100k files totaling 90gb. Right now I'm using rsync daemon but its slow 3.4mb/s and I need to do this a number of times. I'm wondering what options do I have that would max out a 100mbit connection over the internet and be very reliable.
-
2You're getting nearly a third of your connection - that's respectable, but not great. How far away as the electron flies are the files being transferred? – Shane Madden Nov 19 '11 at 01:17
-
50ms latency between the two servers. – incognito2 Nov 19 '11 at 02:52
-
5I saw an alot of files once http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-you-at-everything.html – Nov 21 '11 at 11:24
-
If you are using rsync daemon, there is no ssh involved, right? Then the explanation is probably the infrastructure in between the hosts. You could try netperf or iperf or flowgrind to test the speed between the hosts. If this test gives you a higher transfer rate, then you should look at how rsync is making things slow: read i/o on server slow, write i/o on client, many small files, filesystem etc.. – AndreasM Nov 21 '11 at 11:51
-
possible duplicate of [How to copy a large number of files quickly between two servers](http://serverfault.com/questions/18125/how-to-copy-a-large-number-of-files-quickly-between-two-servers) – Scott Pack Oct 10 '12 at 16:33
7 Answers
Have you considered Sneakernet? With large data sets overnight shipping is often going to be faster and cheaper than transferring via the Internet.

- 32,910
- 7
- 82
- 106
-
11"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." - AST – voretaq7 Nov 21 '11 at 21:29
-
1well, given the affordability of gigabit LAN hardware, if its a LAN transfer, the time spent writing via eSATA to a single spindle is not all that attractive. – memnoch_proxy Nov 17 '13 at 00:38
How? Or TL;DR
The fastest method I've found is a combination of tar
, mbuffer
and ssh
.
E.g.:
tar zcf - bigfile.m4p | mbuffer -s 1K -m 512 | ssh otherhost "tar zxf -"
Using this I've achieved sustained local network transfers over 950 Mb/s on 1Gb links. Replace the paths in each tar command to be appropriate for what you're transferring.
Why? mbuffer!
The biggest bottleneck in transferring large files over a network is, by far, disk I/O. The answer to that is mbuffer
or buffer
. They are largely similar but mbuffer
has some advantages. The default buffer size is 2MB for mbuffer
and 1MB for buffer
. Larger buffers are more likely to never be empty. Choosing a block size which is the lowest common multiple of the native block size on both the target and destination filesystem will give the best performance.
Buffering is the thing that makes all the difference! Use it if you have it! If you don't have it, get it! Using (m}?buffer
plus anything is better than anything by itself. it is almost literally a panacea for slow network file transfers.
If you're transferring multiple files use tar
to "lump" them together into a single data stream. If it's a single file you can use cat
or I/O redirection. The overhead of tar
vs. cat
is statistically insignificant so I always use tar
(or zfs -send
where I can) unless it's already a tarball. Neither of these is guaranteed to give you metadata (and in particular cat
will not). If you want metadata, I'll leave that as an exercise for you.
Finally, using ssh
for a transport mechanism is both secure and carries very little overhead. Again, the overhead of ssh
vs. nc
is statistically insignificant.

- 6,263
- 24
- 28
-
There's encryption overhead in using SSH as a transport sometimes. See: [**Copying files between linux machines with strong authentication without encryption**](http://serverfault.com/questions/452978/copying-files-between-linux-machines-with-strong-authentication-without-encrypti) – ewwhite Feb 12 '13 at 12:36
-
2You can use faster encryption mechanisms if you need to. But you don't necessarily need to pipe this thru ssh. I prefer to set the -O and -I ports on mbuffer on both sides. Even tho this is now two commands, you skip the encryption and maximize network bandwidth by buffering both ends. I'm sending a tar stream at 720+Mbps on my local LAN with the equivalent of `tar -cf - .|mbuffer -m128k -s 256M -I 9090 & mbuffer -m128k -s 256M -O host:9090 | tar -xf -` – memnoch_proxy Nov 17 '13 at 00:42
-
2@memnoch_proxy: That's a good suggestion (which I up voted) but in this day and age where the NSA is even tapping private data lines between data centers (e.g., Google and Yahoo) using encryption, IMO, is always a good habit to make. Using `ssh` makes that simple. Using `stunnel`, `socat` or `openssl` works too, but they're more complex to set up for simple transfers. – bahamat Nov 18 '13 at 19:24
-
1@bahamat thank you for making me look at the question again. My suggestion only seems appropriate if the transfer can occur over a VPN then. For an Internet transfer, I would certainly use ssh as well. – memnoch_proxy Nov 18 '13 at 21:06
You mention "rsync," so I assume you are using Linux:
Why don't you create a tar or tar.gz file? Network transfer time of one big file is faster than many small ones. You could even compress it if you wish...
Tar with no compression:
On the source server:
tar -cf file.tar /path/to/files/
Then on the receiving end:
cd /path/to/files/
tar -xf /path/to/file.tar
Tar with compression:
On the source server:
tar -czf file.tar.gz /path/to/files/
Then on the receiving end:
cd /path/to/files/
tar -xzf /path/to/file.tar.gz
You would simply use rsync to do the actual transfer of the (tar|tar.gz) files.

- 4,366
- 8
- 36
- 60
You could try the tar
and ssh
trick described here:
tar cvzf - /wwwdata | ssh root@192.168.1.201 "dd of=/backup/wwwdata.tar.gz"
this should be rewritable to the following:
tar cvzf - /wwwdata | ssh root@192.168.1.201 "tar xvf -"
You'd lose the --partial
features of rsync
in the process, though. If the files don't change very frequently, living with a slow initial rsync
could be highly worth-while as it will go much faster in the future.

- 18,369
- 23
- 84
- 135
You can use various compression options of rsync.
-z, --compress compress file data during the transfer
--compress-level=NUM explicitly set compression level
--skip-compress=LIST skip compressing files with suffix in LIST
compression ratio for binary files is very low, so you can skip those files using --skip-compress e.g. iso, already archived and compressed tarballs etc.

- 2,525
- 2
- 21
- 23
My best solution after a fair it of testing, is using rsync (tell me what files are different between 2 directories) to tar to zstd(compress with option level 2 works for me) to mbuffer (to max out the network on my 1gb lan). And then reverse that on the receiving side. On receiving side: mbuffer -s 128k -m 1G -I8080 | tar -vx -C /zos25/z --use-compress-program=zstdmt On sending side: (run cmd from /zos25/z) rsync --info=name --out-format="%n" -ainAXEtp /zos25/z/ root@ipaddress:/zos25/z | tar -I 'zstdmt -2' -cvf - -T - | mbuffer -s 128k -m 1G -O ipaddress:8080
I ran ssh-keygen on both sides and copied keys so the ssh didn't ask for a password.
I have 4 way laptops at either end so CPU is not an issue, the network is the bottleneck, i get 70mb/s with data that's compressed about 80%. So its pretty efficient. I would recommend you play with 'zstdmt -2' value to see if greater or lesser compression effects thruput. A faster network could cope with lesser compression (zstdmt -1).

- 1
- 1
I'm a big fan of SFTP. I use SFTP to transfer media from my main computer to my server. I get good speeds, over LAN.
SFTP is reliable, I'd give that a shot, as it's easy to set up, and it could be faster in some cases.

- 137
- 4
-
5FTP needs to die. It's unencrypted, it does not handle interruption well, and there are at least half a dozen viable alternatives for it that don't completely suck. – MDMarra Nov 21 '11 at 11:30
-
1
-
8Yes, have you? It is in no way related to the FTP protocol in anything except name and the fact that it moves files around. – MDMarra Nov 21 '11 at 21:19
-
5FTP is also notoriously unreliable when traversing firewalls (it dates from a time before firewalls when having your client open a random port to accept back-connections was cool, and the hackery of Passive & Extended Passive FTP to work around that limitation is just that: Hackery) – voretaq7 Nov 21 '11 at 21:24