Error Correction with Python and Reed Solomon for large inputs

Question

I am currently implementing a messaging system. I want to send an error protected message to a receiver, but I am failing at the basics, i.e. calculating the error correcting codes. I use the following library for error correction.
Consider the following MWE:

from reedsolo import RSCodec

with open("imageToSend.png", "rb") as pic:
  picContent = pic.read()

correctionLength = int((len(picContent)/100)*20)
rs = RSCodec(correctionLength)

rs.encode(picContent)

As you can see I want to protect the image from 20% errors that might occur. The problem here? The encoded bytearray is empty. And my question: Is it possible to protect large files from errors, without chunking them into smaller pieces and then calculating the error correcting codes?

If you don't mind the overhead of having to copy / separate data into a matrix format, a two layer detection / correction scheme could be used. The rows would have CRC or RS ECC bytes added for error detection, and the matrix would have RS ECC rows appended for correction of erasures detected by the row oriented CRC / RS ECC, requiring just one parity row per erroneous row to be corrected. Interleaving can be used to deal with large dropouts. — rcgldr, Feb 16 '17 at 04:05

deviantfan · Accepted Answer · 2017-01-30T05:29:53.560

2

Is it possible to protect large files from errors, without chunking them into smaller pieces

Depends on the code. With bytewise RS, chunks are necessary (but this lib does the work for you).

As you can see I want to protect the image from 20% errors that might occur. The problem here?

Yes. The number isn't meant to be a percent-like thing in the first place. You should really read the examples of the lib, and get to know a bit how RS works.
The number is how many byte out of 255 should be used for error correcting. Eg. 40 means that for each 215 byte data, there will be 40 byte RS code (about 20%), and in that 255 byte it can correct up to 20 bytes error.

Finally, the LDPC principle might be something you want to look into. A bit worse than RS in correcting errors, but noch much, and it's much faster.

Addition from the comments:
If it can be corrected depends on the locations of the error, yes. If full 255-blocks are gone, it can't correct it. To make the span larger, higher-order RS codes could be used (eg. one independent block could have 65536 byte instead of 255), but a) that's again much slower than the (already slow) 255-RS, and b) The RS libs I know can't do it (yours inclded). You would have to write it yourself.

Again, LDPC could help, if it doesn't bother you that it's a completely different thing. Eg. it has no clear values how many errors are too much to correct/detect, it depends on the error pattern too. And since it's newer than RS, there are less codes/libraries online, maybe none for your case.

((Well, it's old too, but for decades nobody was interested in it, until someone realized that it's useful)).

edited Jan 30 '17 at 05:29

answered Jan 26 '17 at 22:00

deviantfan

11,268
3
32
49

Thanks for the explanation- I indeed got some background information wrong or didn't get enough of it, respectively. I still have two questions, that you maybe are kind enough to answer: 1) Assume, I have a 1000byte file, and I set the correcting code to 40 (about 20% for each 215byte chunk). I then send this bigger message to the received, but 20% of the character are changed on the way ("A" switched to "Z", or "B" to "K" or something like that). Should it then be recoverable? – JJ Abrams Jan 27 '17 at 14:15
2) Because when I try this like "Creating a 100 character random string" encode it with RSCodec(20) (which would mean 20%, right?), then I randomly switch 20 characters (overall length remains the same), the decode function gives me `reedsolo.ReedSolomonError: Could not locate error`. This is odd, right? Or do I still misunderstand how it works? – JJ Abrams Jan 27 '17 at 14:18
@JJAbrams `which would mean 20%, right?` No. That's the whole point of my answer. It's how many byte in a 255 byte block are from the RS code. Passing 20 means that for each 235 byte data, there are 20 byte RS code additionally, and in this 235+20 byte up to 10 byte error can be corrected. That's about 4% of 255. If you really want to correct up to 20% errors, you need to pass 104. – deviantfan Jan 27 '17 at 23:05
Thanks a lot for explaining it again! This explains, why I was not able to correct the errors I brought into the string. However, I have a last question: Let's consider, I have a 2550 character string. I encode it (`RScodec(104)`), the resulting string is 4318 byte long. Deleting 510 characters from the string should be actually recoverable, right? Because the 104 are per 255 byte block and I am deleting only 20%. Or is a random deletion not recoverable (because, in the worst case I could almost delete two whole 255 byte blocks). If so: Is there any method I could apply to protect my string? – JJ Abrams Jan 29 '17 at 15:07
Thanks for your input and editing your answer- I would give you 10 upvotes if I could ;) However, I wouldn't care what I use (I came up with Reed Solomon, because it's one of the first Google hits, when looking for error correction). Also, I wouldn't care about speed/performance. The only thing I care is, that I am able to recover from a certain amount of random errors in the (byte-) string. I guess it's better to open a separate question on SO with this topic, before spamming the comment section here too much I guess ;) – JJ Abrams Jan 30 '17 at 14:19

Error Correction with Python and Reed Solomon for large inputs

1 Answers1