2

I have requirement to transfer huge data(say more than 10 TB) from one machine to another over a closed network(LAN)! is there any proven methods to do this?

i`m thinking over FTP? is this a right approach?

Any suggestion will be helpfull!

Thanks, Prashanth

7 Answers7

2

Huge data is not problem with any protocols, problem is how many files you have and what you need to transfer (just a bunch of data or file permissions, owners etc). FTP is a bad solution for this and inefficient. rsync, glusterfs is a good thing (tm) and so on... depends...

10 TB over LAN will take a lot of time if you have low speed connection.

If it's possible for you to expand bandwidth - add link aggregation, use bonding on servers (especially LACP) - you will win some time for transfers. Also, be sure your disk drives have enough capabilities to transfer data at the speed you need.

GioMac
  • 4,544
  • 4
  • 27
  • 41
  • +1: But the starting point for me would be to check whether I need to tune the TCP stack - i.e. check the BDP – symcbean Jun 15 '12 at 09:25
  • LACP will serve no purpose unless transmission is broken up over multiple concurrent flows to meet link selection hash criteria. – rnxrx Jun 15 '12 at 12:58
  • Sure it won't do generally, but depends on both sides, also, there are many other optimization issues - for example it might be good to have compressed files or channel compression. – GioMac Jun 15 '12 at 13:21
0

I think webdav or rsync will be good too. Even Cifs or NFS are good too, in windows enviroment probably I will use robocopy + cifs with multithread to improve the speed copy.

On Linux may be cp + NFS and GNU parallel https://savannah.gnu.org/projects/parallel/

I hope it will be usefull to you :)

POLLOX
  • 208
  • 3
  • 9
0

FTP works, but I'd make sure the client and server are both reasonably efficient and - most critically - offer the ability to resume transfers.

I would also suggest taking a look at rsync, as it's quite efficient and offers a bunch of options for controlling bandwidth and ongoing synchronization (if that's necessary).

rnxrx
  • 8,143
  • 3
  • 22
  • 31
  • FTP is far from appropriate - no range support (resume functionality is very flaky even on the few clients/servers where it is implemented), no transaction support, cleartext authentication, reverse connections on unpredictable ports (by default), no compression... – symcbean Jun 15 '12 at 09:22
  • His original question specified the use of an internal network - issues of authentication and port correspondences aren't a concern. As to compression - generally anyone with a data set that size to move has already addressed this at the file level. – rnxrx Jun 15 '12 at 12:56
  • Ah of course, internal users never compromise security, hence there's no point in having internal firewalls. – symcbean Jun 15 '12 at 16:23
  • Yes - and clearly it makes lots of sense to put in plenty of firewalls to facilitate high bandwidth apps, 'cause we all know that putting firewalls in is always the right answer! Oh, wait - it's a shame they haven't figured out how to get FTP through firewalls, though, isn't it? – rnxrx Jun 15 '12 at 16:48
0

If Unix based, use Rsync with --archive option. It also allows easy stopping-and-starting. FTP doesn't. My advice is to not use FTP.

When using windows, you might want to take a look at synctoy. I'm not sure, but it also allows for stopping and starting.

And, is it Gigabit? If not, do you have the ability to wire the servers back-to-back and copy over the second NIC? That could speed up things.

Halfgaar
  • 8,084
  • 6
  • 45
  • 86
0

While on Linux I usually do on receiving side:

nc -l 43210 | tar xf -

And on sending side:

tar cf - . | dd bs=1M | nc receiving_hostname 43210

And then I run in another terminal on sending side to get real time transfer statistics on sending console:

while sleep 10; do killall -USR1 dd; done

This just uses simple raw TCP transfer — no need to configure FTP/NFS/CIFS server. This would transfer about 10TB in 24 hours over 1Gbps network if disks on both sides would be fast enough.

You may need to allow connections to used port (43210 in my example) in firewall on receiving side. It should also work on other unices like OSX or FreeBSD. On Windows you can use my "dot_nc" and "dot_nc_l", which are simplistic equivalents of nc and nc -l implemented in C# on Windows, which I used to benchmark raw TCP transfers on Windows.

Tometzky
  • 2,679
  • 4
  • 26
  • 32
0

How far are the machines from each other? Is it possible (i.e. practical) to disconnect the disk array from the first machine, physically attach it to the second machine, and do a local copy?

(Or the other way around - connect the second machine's disk array to the first machine)

tomfanning
  • 3,378
  • 7
  • 34
  • 34
0

I have recently transferred 10 TB through a 1 Gbps connection. The major problem was keeping the 1 Gbps filled at all time. That was no problem when transferring big files, but proved to be a problem when transferring small files as the sender could not seek fast enough.

The solution was to run multiple transfers in parallel. A few with big files and a few transferring the rest. It was based on:

http://www.gnu.org/software/parallel/man.html#example__parallelizing_rsync

If your files are compressible, make sure to compress the transfer (rsync -z).

In theory you should be able to use GNU Parallel and rsync on Windows 7, but even if you cannot you can probably use the idea of transferring big files in parallel with small files.

Ole Tange
  • 2,946
  • 6
  • 32
  • 47