udp file transfer project - is error checking necessary?

Question

I have been given the classical task of transferring files using UDP. On different resources, I have read both checking for errors on the packets (adding CRC alongside data to packets) is necessary AND UDP already checks for corrupted packets and discards them, so I only need to worry about resending dropped packets.

Which one of them is correct? Do I need to manually perform an integrity check on the arrived packets or incorrect ones are already discarded?

Language for the project is Java by the way.

EDIT: Some sources (course books, internet) say checksum only covers the header, therefore ensures sender and receiver IP's are correct etc.. Some sources say checksum also covers the data segment. Some sources say checksum may cover data segment BUT it's optional and decided by the OS.

EDIT 2: Asked my professors and they say UDP error checking on data segment is optional in IPv4, defauld in IPv6. But I still don't know if it's in programmer's control, or OS's, or another layer...

score 4 · Accepted Answer · answered Apr 02 '13 at 18:13

First fact:

UDP has a 16 bit checksum field starting at bit 40 of the packet header. This suffers from (at least) 2 weaknesses:

Checksum is not mandatory, all bits set to 0 are defined as "No checksum"
it is a 16 bit check-sum in the strict sense of the word, so it is susceptible to undetected corruption.

Together this means, that UDP's built-in checksum may or may not be reliable enough, depending on your environment.

Second fact:

An even more realistic threat than data courruption along the transport is packet loss reordering: USP makes no guarantees about

all packets to (eventually) arrive at all
packets to arrive in the same sequence as sent

indeed UDP has no built-in mechanism at all to deal with payloads bigger than a single packet, stemming from the fact, that it wasn't built for that.

Conclusion:

Appending packet after packet as received without additional measures is bound to produce a receive stream differing from the send stream in all but the very favourablest environments., making it a less than optimal protocol for direct file transfer.

If you do want or must use UDP to transfer files, you need to build those parts, that are integral to TCP but not to UDP into the application. There is a saying though, that this will most likely result in an inefrior reimplementation of TCP.

Successfull implementations include many peer-to-peer file sharing protocols, where protection against connection interruption and packet loss or reordering need to be part of the apllication functionality anyway to defeat or mitigate filters.

Implementation recommendations:

What has worked for us is a chunked window implementation: The payload is separated into chunks of a fixed and convenient length, (we used 1023 bytes) a status array of N such chunks is kept on the sending and receiving end.

On the sending side:

A UDP message is inititated, containing such a chunk, its sequence number (more than once) in the stream and a checksum or hash.
The status array marks this chunk as "sent/pending" with a timestamp
Sending stops, if the complete status array (send window) is consumed

On the receiving side:

received packets are checked against their checksum,
corrupted packets are negativly acknowledged if all copies of the sequence number agree, dropped else
OK packets are marked in the status array as "received/pending" with a timestamp
Acknowledgement works by sending an ack packet if either enough chunks have been received to fill an ack packet, or the timestamp of the oldest "receive/pending" grows too old (some ms to some 100ms).
Ack packets need checksumming, but no sequencing.
Chunks, for which an ack has been sent, are marked as "ack/pending" with timestamp in the status array

On the sending side:

Ack packets are received and checked, corrupted packets are dropped
Chunks, for which an ack was received, are marked as "ack/done" in the status array
If the first chunk in the status array is marked "ack/done", the status array slides up, until its first chunk again is not maked done.
This possibly releases one or more unsent chunks to be sent.
for chunks in status "sent/pending", a timeout on the timestamp triggers a new send for this chunk, as the original chunk might have been lost.

On the receiving side:

Reception of chunk i+N (N being the window width) marks chunk i as ack/done, sliding up the receive window. If not all chunks sliding out of the receive window are makred as "ack/pending", this constitutes an unrecoverable error.
for chunks in status "ack/pending", a timeout on the timestamp triggers a new ack for this chunk, as the original ack message might have been lost.

Obviously there is the need for a special message type from the sending side, if the send window slides out the end of the file, to signal reception of an ack without sending chunk N+i, we implemented it by simply sending N chunks more than exist, but without the payload.

This is exactly what I said: *... making it a less than optimal protocol for direct file transfer* amd *this will most likely result in an inefrior reimplementation of TCP*. My implementation recommendations start with *If you do want or must use UDP*, so I am totally with you. 1. You should use TCP, 2. only if you must use UDP, then ... — Eugen Rieck, Apr 02 '13 at 20:13
Thanks for the thorough answer. Is this approach the one called "go back N ARQ" ? — uylmz, Apr 09 '13 at 21:57

par · Answer 2 · 2013-04-02T18:08:03.563

You can be sure the packets you receive are the same as what was sent (i.e. if you send packet A and receive packet A you can be sure they are identical). The transport layer CRC checking on the packets ensures this. Since UDP does not have guaranteed delivery however, you need to be sure you received everything that was sent and you need to make sure you order it correctly.

In other words, if packets A, B, and C were sent in that order you might actually receive only A and B (or none). You might get them out of order, C, B, A. So your checking needs to take care of the guaranteed delivery aspect that TCP provides (verify ordering, ensure all the data is there, and notify the server to resend whatever you didn't receive) to whatever degree you require.

The reason to prefer UDP over TCP is that for some applications neither data ordering nor data completeness matter. For example, when streaming AAC audio packets the individual audio frames are so small that a small amount of them can be safely discarded or played out of order without disrupting the listening experience to any significant degree. If 99.9% of the packets are received and ordered correctly you can play the stream just fine and no one will notice. This works well for some cellular/mobile applications and you don't even have to worry about resending missed frames (note that Shoutcast and some other servers do use TCP for streaming in some cases [to facilitate in-band metadata], but they don't have to).

If you need to be sure all the data is there and ordered correctly, then you should use TCP, which will take care of verifying that data is all there, ordering it correctly, and resending if necessary.

No problem, I'm glad I was able to help. Thank you for the bounty! — par, Apr 03 '13 at 21:50
I think this answer is just a general information on TCP and UDP (that can be found on every network book or on internet) and not the exact answer to my question. I asked whether UDP provides data error checking or not. — uylmz, Apr 06 '13 at 21:39
What do you mean by "data?" My answer clarifies that the UDP protocol provides integrity checks on a per-packet basis but that is all. Payload integrity checks, which means the aggregate data of more than one packet is not provided by UDP. Aggregate data integrity involves correctly sequencing received packets and ensuring that all have arrived for a given payload. TCP does this, UDP does not. — par, Apr 07 '13 at 22:39
By the way, your question states you have been given the "classical task of transferring files via UDP." UDP is *not* suitable for file transfer. For that you should use TCP, otherwise you will simply be attempting to reimplement TCP. — par, Apr 07 '13 at 22:42
By the phrase "classical task of transferring files via UDP." I meant it is a very common school project for CS students. Purpose is to reinvent the TCP wheel, may be a little more different. — uylmz, Apr 09 '13 at 21:48

score 2 · Answer 3 · answered Apr 02 '13 at 18:21

The UDP protocol uses the same strategy for checking packets with errors that the TCP protocol uses - a 16 bits checksum in the packet header.

The UDP packet structure is well known (as well as the TCP) so the packet can be easily tampered if not encrypted, adding another checksum (for instance CRC-32) would also make it more robust. If the purpose is to encrypt data (manually or over an SSL channel), I wouldn't bother adding another checksum.

Please take also into consideration that a packet can be sent twice. Make sure you deal with that accordingly.

You can check both packet structure on Wikipedia, both have checksums:

You can check the TCP packet structure with more detail to get tips on how to deal with dropped packets. TCP protocol uses a "Sequence Number" and "Acknowledgment Number" for that purpose.

I hope this helps, and good luck.

score 1 · Answer 4 · answered Apr 02 '13 at 17:34

1

UDP will drop packets that don't meet the internal per-packet checksum; CRC checking is useful to determine at the application layer if, once a payload appears to be complete, that what was received is actually complete (no dropped packets) and matches what was sent (no man-in-the-middle or other attacks).

answered Apr 02 '13 at 17:34

Adrian

42,911
6
107
99

To clarify, CRC checking at the application layer should be done on the payload to ensure that the data is all there and correctly ordered, not on individual packets. You don't need to duplicate what the transport layer does. – par Apr 02 '13 at 17:53

udp file transfer project - is error checking necessary?

4 Answers4