0

I recently found out that bits in DRAMs can be randomly flipped by the decay of particles in it or by cosmic rays. And I wondered how often this errors occur.

Unfortunately, the most recent statistic I found is from the year 1990 (source), which states that an error should occur every month per 128MB of memory.

Since I couldn't find any recent statistic of soft error rates in modern RAMs, I tried to write a program in java to measure the soft error frequency on 4GB of my RAM. I expect the program to work to detect every soft error in the allocated 4GB of RAM, if it weren't optimized in any way.

The problem is, I have no idea how to check if the program works (I assume it doesn't because of optimization), and I don't know how to change it to work as expected.

Based on the 1990's statistic I should expect to detect an error every 22 hours, thus I would need to run the program for almost a week to state with a 99% confidence that it works. Assuming that modern hardware doesn't have a better soft error rate than in the 90s.

The following loop is the most important part of my program:

int[] memory = new int[1_073_741_824]; // 4GB array, each value initialized to 0
while (true) {
   for (int i = 0; i < memory.length; i++) {
        if (memory[i] != 0) {
            // soft error found
            memory[i] = 0;
            // print information about the error in a log file
        }
    }
    // Sleep for a minute
}

What can I do to avoid optimization to break the intended use of the program?

P.S. If you think that my code wouldn't even work without optimization, please explain why.

Sven
  • 119
  • 1
  • 8
  • 5
    yeah this will **never** do what you expect it to do, not in any language and not in Java especially –  Apr 28 '17 at 19:00
  • @JarrodRoberson What would you change in the code effectively measure the soft error frequency? – Sven Apr 28 '17 at 19:07
  • 1
    Before everyone jumps on the OP's back, [there has been a Java bug](http://stackoverflow.com/questions/12317668/java-int-array-initializes-with-nonzero-elements) in which the default value of an array was not set to zero, potentially exposing heap memory information. Now, what the OP is suggesting is a bit crazy in that no, this probably won't find any memory errors, but it *could* turn up a JVM bug if they're unlucky. – Makoto Apr 28 '17 at 19:11
  • since it is not clear from my first comment, **you cannot detect or measure what you want with the code you posted, in any language.** It takes **special hardware** to detect errors in nominally functioning modern memory, and they are so rare that it would take thousands of wall clock hours to *maybe* produce just one 1bit error that escapes the hardware correction. Now for memory that is just broken/gone bad that is a different story, but that is not what you are asking about. **Why** you are so far off base is **off-topic: too broad**. –  Apr 28 '17 at 19:29
  • @JarrodRoberson But if we already know which value should be in each position of the array (in this case 0), we can detect an error simply by comparing each entry against that value (0). Error detecting is only necessary when we don't know what value to expect. – Sven Apr 28 '17 at 19:41
  • @JarrodRoberson Before you call someone misinformed, please try to understand what he is trying to do. I didn't try to write error detecting software, and even if I wanted to do it, I assure you it would be possible. I have already done it once (although it is highly inefficient). – Sven Apr 28 '17 at 19:54
  • 1
    *(in this case 0), we can detect an error simply by comparing each entry against that value (0).* no you cannot, this statement just demonstrates how much you do not know about how the tools ( Java ) you are using work ( or do not work in this case ). Have fun with the D/K Effect! –  Apr 28 '17 at 19:56
  • @JarrodRoberson Then please explain to me why this wouldn't work, instead of just saying how misinformed and biased I am. I posted this question because I don't have enough insight on the problem, and want to better understand what could go wrong and why. – Sven Apr 28 '17 at 20:03
  • @JarrodRoberson Why does this not do what OP wants? – Thorbjørn Ravn Andersen Apr 28 '17 at 21:43
  • 1
    You *have* to do this sort of thing in a low-level language like C. And even then, you have to worry about paging (unless you reboot in real mode, or else write a kernel that maps virtual memory straight to physical memory) ... – o11c Apr 29 '17 at 02:14
  • Nothing in the edit changes the fact that you could **never** detect this using Java ( or any other modern language runtime). The explanation is way **too-broad** to cover on StackOverflow. –  Apr 29 '17 at 07:33
  • As o11c pointed out, it's unlikely that the OS will keep 4GiB of data all paged in at the same time (unless you have a huge amount of RAM). So the OS is writing 0s into the array constantly. Furthermore, you have to consider the cache hierarchy, you don't always read the actual values on memory. On top of that DRAM is refreshed and may be error corrected or tolerant. Keep in mind that if 4GiB should have a bit flip every 22h than most computers won't work properly for long: flipping a bit is very destructive (see: Row Hammer). – Margaret Bloom May 03 '17 at 11:32
  • Finally, the source you linked is down - I'm sure they performed the measurement in a very specific environment. Being inside a building on Earth may not be that environment. – Margaret Bloom May 03 '17 at 11:33
  • @MargaretBloom Now the link to the source should work, thank you for your answer :) – Sven May 04 '17 at 18:16
  • @Sven, A more authoritative document is [this one](http://www.pld.ttu.ee/IAF0030/curtis.pdf). I believe it's not the document mentioned in your link but it shows the full picture. Including real error rates. – Margaret Bloom May 05 '17 at 12:48

1 Answers1

0

Simple: as soon as your loop body does something (like just printing the index it failed on) , optimizing away the loop body would be invalid!

And if at all, it would be the JIT optimizing away things, javac is not doing much more than constant folding in regard of optimizations.

But of course, you would be on the safe side using languages where you have full control over such things.

Beyond that: I rather doubt that you will ever hit an error with such code.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • 4GB arrays are possible with the -Xmx argument on the command line – Sven Apr 28 '17 at 19:09
  • You are right. I keep mixing up max array size with max number of entries. The later is limited to Integer.MAX_VALUE - x (x being a small number depending on the JVM version). – GhostCat Apr 28 '17 at 19:19