0

Story

While conducting an experiment I was saving a stream of random Bytes generated by a hardware RNG device. After the experiment was finished, I realized that the saving method was incorrect. I hope I can find the way how to fix the corrupted file so that I obtain the correct stream of random numbers back.

Example

The story of the problem can be explained in the following simple example.

Let's say I have a stream of random numbers in an input file randomInput.bin. I will simulate the stream of random numbers coming from the hardware RNG device by sending the input file to stdout via cat. I found two ways how to save this stream to a file:

A) Harmless saving method

This method gives me exactly the original stream of random Bytes.

import scala.sys.process._
import java.io.File

val res = ("cat randomInput.bin" #> new File(outputFile))!

B) Saving method leading to corruption

Unfortunately, this is the original saving method I chose.

import scala.sys.process._
import java.io.PrintWriter

val randomBits = "cat randomInput.bin".!!

val out = new PrintWriter(outputFile)
out.println(randomBits)
if (out != null) {
  out.close()
  Seq("chmod", "600", outputFile).!
}

The file saved using method B) is still binary, however, is is approximately 2x larger that the file saved by method A). Further analysis shows that the stream of random Bits is significantly less random.

Summary

I suspect that the saving method B) adds something to almost every byte, however, the understanding of this is behind my expertise in Java/Scala I/O.

I would very much appreciate if somebody explained me the low-level difference between methods A) and B). The goal is to revert the changes created by saving method B) and obtain the original stream of random Bytes.

Thank you very much in advance!

Luinorn
  • 1
  • 1

1 Answers1

0

The problem is probably that println is meant for text, and this text is being encoded as Unicode, which uses multiple bytes for some or all characters, depending on which version of Unicode.

If the file is exactly 2x larger than it should be, then you've probably got a null byte every other byte, which could be easy to fix. Otherwise, it may be harder to figure out what you would need to do to recover the binary data. Viewing the corrupted file in a hex editor may help you see what happened. Either way, I think it may be easier to just generate new random data and save it correctly.

Especially if this is for an experiment, if your random data has been corrupted and then fixed, it may be harder to justify that the data is truly random compared to just generating it properly in the first place.

kaya3
  • 47,440
  • 4
  • 68
  • 97
  • 1
    It is not just `println` which is meant for text, but `Writer` and its subclasses in general: "Abstract class for writing to **character** streams" https://docs.oracle.com/javase/7/docs/api/java/io/Writer.html – Alexey Romanov Nov 01 '19 at 16:54
  • Unfortunately, running new experiment is "expensive" for me. I need the stream of random numbers to evaluate the experiment. – Luinorn Nov 01 '19 at 17:13
  • In that case, good luck recovering the data! Hopefully it's as simple as a null byte every other byte. – kaya3 Nov 01 '19 at 17:15