0

Here is a brainteaser!

Let us say we have 64 bytes, each byte is eight bits plus one parity bit

Let us say there is a further LRC check byte (formed by bit wise XOR ing all 64 bytes)

So we can visualise this as a 8+1 by 64*1 grid

If one bit is damaged, the parity checks will flag it: one row will fail the parity check, and also one column.

So it will be trivial to locate the offending bit and reverse it.

However, four damaged bits in a square arrangement will fool the parity checking: as each offending row and column will contain 2 wronguns, hence give correct parity reading.

But this is a very unlikely scenario.

My question is: how would I go about repairing a dataset in this way? how much repairing is possible?

My gut feeling is there must be some sensible way to repair a slightly damaged data set...

Johannes Pille
  • 4,073
  • 4
  • 26
  • 27
P i
  • 29,020
  • 36
  • 159
  • 267

3 Answers3

1

64 bytes, each byte is eight bits plus one parity bit LRC check byte (formed by bit wise XOR ing all 64 bytes) 64 + 8 bytes would be used as 1 parity bit per byte word. so each block would have 64 data + 9 parity bytes

LRC will tell you the block is bad, if the bit is not the parity bit on the block (big assumption.) You could calculate empathy as you know what byte is bad, you can brute force several different bits to correct both the lrc and the byte until both matched but as a single lrc only tells you that you have a 1/256 chance of getting a combination that matches you will get a set of mathematical data that fulfills the criteria as "fixed"

If you know what type of file it is, this will improve your data recovery chances. For instance if you know the file type is text, byte 5 is uppercase or lower case and any number not in the realm of alpha numeric & common chars could be thown out.

so yes it's technically possible to brute force data back into shape with a schema like this.

If you decide to do this, I'd recommend you start by blocking it as 50 bytes of data per 64 block. this leaves you 14 bytes of parity to implement a mathematical model on. 14/64 is close to 16/64 or 1/4 of the bytes involved in the block are just for rescue purposes.

1/4 of your data set would be lost but some stability gained. in your example 9/71 that's about 13% of the data is used.

I've brute forced data sets like this in data recovery before, but it's impossible in most cases to determine you have been successful.

for example, if you have a file of 1 Megabytes, and say your empathy on your parity is damaged in 4 areas. It's an MP4 file, or zip file so critically impaired.

If the damage is limited to 1 bit per block and you most likely can repair the file. if you have say 2 bits per block, you would know that 2 areas per block were damaged yielding a sequence of 8 squared to possible files. you could generate 64 files and try to unzip each, but say the damage is 3 bits per block, now you might be looking at 8 cubed.

It's possible with brute force to repair things like this and I have done so in emergencies with limited success.

CoRe
  • 102
  • 7
0

2 dimensional parity repair with single bit per column or row is not repairable just detectable.

as far as how to build repairable empathy in 2 dimensions of a data set for a set amount of damage... I once calculated a single bit corruption in each direction could be restoreable if you had 1 byte for every 6 bytes in each direction of the array.

Look into, maybe wiki raid 5 and raid 6 architecture. they employ similar methods to sectors spread between drives.

CoRe
  • 102
  • 7
0

I assume you meant 8+1 by 64+1 grid.

Assume one of the non-parity bits is damaged. The parity bit on the data byte will flag the byte. The parity on all the bytes will flag the column(bit) that was damaged. All good.

Assume one of the parity bits of the "ordinary" data bytes is damaged. The last bit of the extra parity byte would flag it, no?

Assume one of the parity bits of the parity byte is damaged. The last bit of the extra parity byte would flag it, no?

So how is the value of that bit defined?

chrono
  • 50
  • 3
  • You seem to put a lot of thought into your answers and make an effort to be thorough. I (and many SO users) appreciate that a lot! However, please note that most people asking a question are looking for a specific answer to a specific question, and not merely a general (however thorough) discussion of the topic. You don't seem to have actually addressed the question here. – Andrew Barber Dec 02 '11 at 10:31
  • I AM addressing the specific problem here. He was asking about this specific matrix being protected by parities on both rows and columns. On its face, that might work for errors involving up to one bit (since the parities would flag BOTH the row AND column where the error occured). I was pointing out that errors can also affect the parity data itself, which has insufficient protection so that a single bit error in the parity data could NOT be located. – chrono Dec 02 '11 at 14:25
  • I think the down vote is wrong... while not answering the question, this clearly elucidates a pertinent issue. such ' answers ' should be cherished as they increase the question's value as a resource. I guess it could have been put as a comment... – P i Dec 02 '11 at 21:10
  • @Pi if it doesn't answer the question, as you say it doesn't, then it certainly is a candidate for down voting. It is interesting... but as you note, it really should be a comment, if anything at all. – Andrew Barber Dec 02 '11 at 21:16