6

I have a requirement to create a UDP file transfer system. I know TCP is guaranteed and much more reliable, but I need to transfer huge files between locations and I think the speed advantage in this project outweighs the benefits using TCP. I’m just starting this project, but would like some guidance if anyone has done this before. I will be writing both sides (client and server) so I don’t need to worry about feature limitations in other products.

In a nutshell I need to:

  • Take large files and send them in chunks
  • Be able to throttle bandwidth from the client
  • Create some kind of packet numbering system for errors, retransmitions and assembling files by chunk on server (yes, all the stuff we get from TCP for free :-)
  • Configurable datagram size – I think some firewalls complain if they get too big?
  • Anything else I may be missing

I’m starting this journey using UdpClient and would like to write this app in C#. Any words of wisdom (other than to use TCP)?


It’s been done with huge success. We used to use RocketStream.com, but they sold their product to another company for internal use only. We typically get speeds that are 30X faster than FTP or raw TCP byte transfers.

H H
  • 263,252
  • 30
  • 330
  • 514
Scott
  • 874
  • 3
  • 12
  • 36
  • 7
    Use TCP :) "I think the speed advantage in this project outweighs the benefits using TCP." What? Do you really expect to get any speed advantage over TCP (why?). – ysdx Sep 29 '11 at 21:31
  • 4
    UDP usually performs better than TCP in case of **short data** transfers, not long ones. – Serge Wautier Sep 29 '11 at 21:36
  • 5
    At a guess, the speed advantages of UDP come from precisely the fact it doesn't natively implement the things you say you're going to implement anyway. – millimoose Sep 29 '11 at 21:44
  • 3
    Or in other words, the larger the file, the more you care of a reliable transport. Shop for a TFTP library. – Hans Passant Sep 29 '11 at 21:44
  • 1
    That said, you could look at the [uTP](http://www.utorrent.com/help/documentation/utp), the UDP-flavoured bittorrent protocol. Of course [their testing](http://blog.bittorrent.com/2009/11/13/testing-%C2%B5tp-is-%C2%B5tp-actually-faster-than-regular-bittorrent/) shows the speed is a little faster than the TCP protocol, sometimes, maybe. – millimoose Sep 29 '11 at 21:45
  • @HansPassant: I'm not very familiar with TFTP, but isn't its design goal merely to be implementable on very anemic hardware? Wikipedia mentions that "Each data packet contains one block of data, and must be acknowledged by an acknowledgment packet before the next packet can be sent." – this sounds *really really slow* to me. – millimoose Sep 29 '11 at 21:51
  • Everything that you've defined is in TCP. Assuming that you are doing this over a gigabit LAN (otherwise why bother? I mean why bother anyway, but...), you know that _SMB_ over gigabit LAN can transfer faster than your average disk can write (somewhere around 119mb/s for gigabit ethernet, about ~65mb/s for your average hard-drive write)? – Ritch Melton Sep 29 '11 at 21:52
  • 1
    Sure, you could ack a packet after several ones are received instead of ack-ing it right away. Gotta do something reasonable when the ack gets lost however. This *should* sound familiar, it is how tcp works. The delayed ack is called the "window". You are fairly doomed to re-invent it. Possibly imperfectly. Writing reliable protocols on top of an unreliable transport *and* make it performant is very hard to get right. It took the BBN guys a while. – Hans Passant Sep 29 '11 at 22:04
  • 1
    Unless they are counting compression, or FTP is >97% overhead, their claim of a 30x speed improvement seems outlandish given the same network conditions. Not saying there isn't room for improvement, but that is a bit much. – Guvante Sep 29 '11 at 22:14
  • 2
    We're actually seeing these speeds now with rocketstream.com, so it's not just marketing. The idea is to not ACK every packed. Start at an arbitrary number of packets, then check to make sure they got there. If they did, transmit the next set of packets. If the didn't or there were any errors, retransmit and lower the number of packets between ACKs. The idea being big performance gains on better networks, less gains on crappy ones. – Scott Sep 29 '11 at 22:43
  • Just a nutty idea but did you look at using bit-torrent? You know, distributing a number of large files over many clients sounds familiar. – H H Sep 29 '11 at 22:48
  • 1
    Have you looked at UDT? http://udt.sourceforge.net/ – Len Holgate Sep 30 '11 at 08:49
  • 1
    I've used UDP, and the speed benefits are insane. Unreal file transfer. There are companies like Aspera or FileCatalyst that do this and charge an arm and a leg (for good reason). Too bad everyone assumes TCP would be better at this. It's not... – Nuby Dec 15 '11 at 22:52

5 Answers5

2

in regards to

Configurable datagram size – I think some firewalls complain if they get too big?

one datagram could be up to 65,536 bytes. cosidering all the ip header information you'll end up with 65,507 bytes for payload. but you have to consider how all devices are configured along you network path. typically most devices have set an MTU-size of 1500 bytes so this will be typically your limit "on the internet". if you set up a dedicated network between your locations you can increase your MTU an all devices.

further in regards to

Create some kind of packet numbering system for errors, retransmitions and assembling files by chunk on server (yes, all the stuff we get from TCP for free :-)

i think the best thing in your case would be to implement a application level protocol. like

32 byte sequence number 8 byte crc32 checksum (correct me on the bytesize) any bytes left can be used for data

hope this gives you some bit of a direction

::edit::

from experience i can tell you UDP is about 10-15% faster than TCP on dedicated and UDP-tuned networks.

Layticia
  • 121
  • 3
1

I'm not convinced the speed gain will be tremendous, but an interesting experiment. Such a protocol will look and behave more like one of the traditional modem based protocols, and probably ZModem is one of the better examples to get some inspiration from (implements an ack window, adaptive block size, etc).

There are already some people who tried this, check out this site.

fvu
  • 32,488
  • 6
  • 61
  • 79
1

That would be cool if you succeed.

Don't go in it without WireShark. You'll need it.

For the algorithm, I guess that you have pretty much the idea of how to start. Maybe some pointers:

  1. start with MTU that will be common to both endpoints, and use packets of only that size, so you'll have control over packet fragmentation (when you come down from TCP, I hope that this is for the more control over low level stuff).
  2. you'll probably want to look into STUN or TURN for punching the holes into NATs.
  3. look into ZModem - that also has a nostalgic value :)
  4. since you want to squeeze maximum from you link, try to put as much as you can in the 'control packets' so you don't waste a single byte.
  5. I wouldn't use any CRC on packet level, because I guess that networks underneath are handling that stuff.
Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99
1

I just had an idea...

  1. break up a file in 16k chunks (length is arbitrary)
  2. create HASH of each chunk
  3. transmit all hashes of the chunks, using any protocol
  4. at receiving end, prepare by hashing everything you have on your hard drive, network, I mean everything, in 16k chunks
  5. compare received hashes to your local hashes and reconstruct the data you have
  6. download the rest using any protocol

I know that I'm 6 months behind the schedule, but I just couldn't resist.

Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99
0

Others have said more interesting things, but I would like to point out that you need to make sure you use a good compression algorithm. That will make a world of difference.

Also I would recommend validating your assumptions as to the speed improvement possibility, make a trivial system of sending data (not worrying about loss, corruption, or other problems) and see what bandwidth you get. This will at least give you a realistic upper bound for what can be done.

Finally consider why you are taking on this task? Will the speed gains be worth it after the amount of time spent developing it?

Guvante
  • 18,775
  • 1
  • 33
  • 64
  • All good points. A typical file transfer for us is 100+ gigs and is already RARed. It's not uncommon to transfer 500+ gigs at a time. This is why I said it's probably worth having to invent quality checks when I have to transfer contiguous files of this size. The technology works. I just have to find out how it was done :-) – Scott Sep 29 '11 at 22:50