Ruby Benchmarking Accuracy - Branch Prediction at its finest?

Question

So this morning I decided to play around with Benchmarking for the first time.

I was curious about the speed different between code with "do-end" block formatting vs. "{ }" formatting.

So I stored the Benchmark code in a Proc so I could call it multiple times consecutively:

n = 100_000_000
bmp = Proc.new do
  Benchmark.bm do |x|
    x.report {n.times {a = "1"}}
    x.report {n.times do; a = "1"; end}
  end
end

My results we're expected when I ran it once.

>> bmp.call
  user     system      total        real
1.840000   0.030000   1.870000 (  1.874507)
1.860000   0.050000   1.910000 (  1.926101)
=> true

But then ran it again.

>> bmp.call
  user     system      total        real
1.870000   0.050000   1.920000 (  1.922810)
1.840000   0.000000   1.840000 (  1.850615)

To me this looks like the exact opposite of what I'm expecting. I am familiar with the concept of Branch Prediction. Is this a classic example of Branch Prediction? If not, what? Is there anyway to prevent any inaccuracies like this (if this is even considered one)?

EDIT: I did run this code over 30 times, after some suggestions. Frequently it would alternate between the two results. The sample of the data is found here:

gist.github.com/TheLarkInn/5599676

I copied your code and ran it myself, 30 times, and do not get strict alternation between results. As a binary string (where 1 confirms your hypothesis on `{}` somehow being faster, and 0 refutes it), I got `00 11 10 00 10 10 01 00 11 01 00 11 11 00 11` which looks pretty much random to me — Neil Slater, May 17 '13 at 14:46

score 4 · Accepted Answer · answered May 17 '13 at 14:36

First of all, your benchmark is utterly pointless. The difference between the do / end syntax and the { / } syntax is just that: syntax. There is no semantic difference. Ergo, there cannot possibly be any runtime performance difference whatsoever between the two. It's just not logically possible. You don't need to benchmark it.

The only performance difference that could exist, is that one takes longer to parse than the other. However, none of the two is harder to parse than the other. The only difference is precedence. Therefore, there very likely isn't any performance difference in parsing, either.

And even if there were a performance difference in parsing, your benchmark wouldn't show it. You are using a benchmark written in Ruby, but in order to run Ruby code, the Ruby execution engine has to parse it first, which means that parsing will already have happened, before your benchmark even starts. So, even if your benchmark weren't pointless, it would still be useless, since it cannot possibly measure the performance difference in parsing.

As to your question about Branch Prediction: there are no branches in your code, there is nothing to predict.

BTW: even if your benchmark was intended for a different purpose, it still wouldn't be measuring anything, since at least the more advanced Ruby implementations would recognize that your blocks are essentially no-ops and simply optimize them away. And even if they aren't optimized away, all they are measuring is memory allocator performance (allocating a couple hundred megabytes of tiny String objects), not the performance of blocks.

Wonderfully put. The purpose of the benchmark was to see if there was any time difference between running a code block with { } vs. do / end. Thank you for the explanation. In regards to Branch Prediction, I was under the impression that it not only had to do with Branches in code, but was more a general idea that processors will attempt to "guess" what will be run next to increase performance. — Sean Larkin, May 17 '13 at 15:14

score 1 · Answer 2 · answered May 17 '13 at 13:45

1

Just a quick primer on stats:

I'm not sure if two runs is enough to spot a trend. What if there was a difference in system load between the two test blocks the second time you ran it?

A rule of thumb for determining a statistical difference between two samples is that 30 or more data points will give you a statistically relevant result.

I'd run your tests at least that many times, store the results for the two versions independently, and then compare them internally to ensure they're consistent, before comparing the two sets to one-another.

It could be that your initial premise is incorrect :)

answered May 17 '13 at 13:45

mcfinnigan

11,442
35
28

System load should not be a major factor, I think `Benchmark` is collecting process ticks (so only registers things the code is doing, and is pretty good at ignoring other processes, although not everything - contention could still occurs on disk, RAM etc). However, that means despite the 100 million repeats, there are only ~190 boolean ("CPU is running my process") sample points in each test. So the main part of your argument about stats is spot on. – Neil Slater May 17 '13 at 13:52
I did run this over 30 times with the same results. Each time the proc was called, the stats flip flopped. – Sean Larkin May 17 '13 at 13:58
@Sean Larkin: A strict flip, with exact same values each time would not be a statistical issue, but something underlying as a difference. But I am sceptical, are you just eyeballing these and spotting a pattern which isn't really there? Could you please publish your run of 30 (maybe a gist is suitable), so we could take a look? – Neil Slater May 17 '13 at 14:31
"It could be that your initial premise is incorrect" – Of course, it is. The difference is only syntactical. There *cannot possibly* be any performance difference whatsoever, except during parsing, which this benchmark cannot measure, because it is written in Ruby, and Ruby will have to have already parsed the benchmark before running it in order to be able to run it. – Jörg W Mittag May 17 '13 at 14:40
https://gist.github.com/TheLarkInn/5599676 is the data I just ran! I'll post in my original question as well. There appears to be some what of a pattern where the speeds do "flipflop" but I will say that it is not consistent. Still noteworthy though. – Sean Larkin May 17 '13 at 15:07
@Sean Larkin: Thank you for sharing that. If I convert it to binary where 1 matches your hypothesis, I get the binary string `01 01 10 10 01 10 11 11 00 00 11 01 11 00 11` which is pretty much random. That's not a full statistical analysis, but your results are completely consistent with there being no difference between the two block syntaxes at runtime. It is random fluctuations that you didn't expect. Usually I wouldn't note something as being faster if I got less than 1% difference on a short (10 second) test. On the plus side, your machine is much faster than mine :-) – Neil Slater May 17 '13 at 15:20

Ruby Benchmarking Accuracy - Branch Prediction at its finest?

2 Answers2