9

I've looked at all the previous similar questions, but the answers seemed to be all over the place and no one was moving a lot of data (100 GB is different from 10 TB).

I've got about 10 TB that I need to move from one RAID array to another, gigabit Ethernet, the free encyclopedia, XFS file systems. My biggest concern is having the transfer die midway and not being able to resume easily. Speed would be nice, but ensuring transfer is much more important.

Normally I'd just tar & netcat, but the RAID array I'm moving from has been super flaky as of late, and I need to be able to recover and resume if it drops mid process. Should I be looking at rsync?

Looking into this a bit more, I think rsync might be too slow, and I'd like to avoid this taking 30 days or more. So now I'm looking for suggestions on how to monitor / resume the transfer with netcat.

Peter Mortensen
  • 2,318
  • 5
  • 23
  • 24
lostincode
  • 193
  • 1
  • 4
  • I sometimes need to do large file transfers (But not 10TB ;) rsync has many flags, some of these might impact the performance of a large file transfer (I think --checksum and --archive might slow you down, for example. This would make a big difference when transferring 10TB. ). Can anyone recommend good options to help optimize the performance of such a large file transfer? Would tuning `--block-size=SIZE` help? – Stefan Lasiewski Jun 07 '10 at 22:24
  • is there anyway to remove the ssh overhead? – lostincode Jun 07 '10 at 22:45
  • 1
    set up rsyncd on your receiving end? no need for ssh – cpbills Jun 07 '10 at 23:12
  • 3
    Run an rsync daemon on the receiving side as well as the client on the sending side. – Dennis Williamson Jun 07 '10 at 23:13
  • Is this a single file, multiple large files, countless small files, or a mixture of filesizes? Rsync conserves bandwidth at the expense of time, so if the transfer is mainly large files, may not be suitable – Jmarki Jun 08 '10 at 00:36
  • There are several problem scenarios here. Are we looking at migration of data or periodic synchronisation/backup of data? The choice of tools will be different, I think. – Jmarki Jun 08 '10 at 00:55
  • 2
    If you can't set up an rsync daemon on one side and are stuck with SSH, you can reduce the encryption overhead with less-good encryption like: rsync -avz -e 'ssh -c arcfour' SOURCE DEST – David Jun 08 '10 at 03:14

4 Answers4

14

yep, rsync

outside oddball, the async features DRBD came out with recently.

cagenut
  • 4,848
  • 2
  • 24
  • 29
2

Never underestimate the bandwidth of a station wagon full of tapes. 10TB would be feasible with relatively cheap consumer grade NAS equipment if you can divide it into (say 2TB) chunks. If this is a one-off then a semi-manual process might be workable, and a 2TB NAS is only a few hundred dollars.

If you need an ongoing process then you could set up RSYNC after you've done the initial transfer.

2

I had to do this kind of task some months ago. I used parallel rsync to speed up the process. It split the files to be transferred / synced in chunks, and it can be resumed at any time. See link below for parallel rsync script.

https://gist.github.com/rcoup/5358786

MadHatter
  • 79,770
  • 20
  • 184
  • 232
0

You could try setting up an FTP server on the server with the data to be copied and use an FTP client with "resume" on the receiving end. I use Filezilla server and client and I use the "resume" feature of the client quite often and it has always worked without a hitch.

jamesbtate
  • 567
  • 2
  • 6
  • 14