Bitwise operator advantages in StringBuilder

Question

Why does the reverse() method in StringBuffer/StringBuilder classes use bitwise operator?

I would like to know the advantages of it.

public AbstractStringBuilder reverse() {
    boolean hasSurrogate = false;
    int n = count - 1;
    for (int j = (n-1) >> 1; j >= 0; --j) {
        char temp = value[j];
        char temp2 = value[n - j];
        if (!hasSurrogate) {
            hasSurrogate = (temp >= Character.MIN_SURROGATE && temp <= Character.MAX_SURROGATE)
                || (temp2 >= Character.MIN_SURROGATE && temp2 <= Character.MAX_SURROGATE);
        }
        value[j] = temp2;
        value[n - j] = temp;
    }
    if (hasSurrogate) {
        // Reverse back all valid surrogate pairs
        for (int i = 0; i < count - 1; i++) {
            char c2 = value[i];
            if (Character.isLowSurrogate(c2)) {
                char c1 = value[i + 1];
                if (Character.isHighSurrogate(c1)) {
                    value[i++] = c1;
                    value[i] = c2;
                }
            }
        }
    }
    return this;
}

similar question here: http://stackoverflow.com/questions/6385792/what-does-a-bitwise-shift-left-or-right-do-and-what-is-it-used-for — Salman A, Feb 03 '15 at 17:02
I just realized: In this case it is a clever short-cut: Since you don't have to reverse anything if the String is only 1 or 0 characters long, you don't want to enter the loop at all if count is < 2. If you use `(n-1) / 2` and count = 1 then n = -1 and the result is 0 so you would enter the loop for one iteration. If you use Bitshift: `-1 >> 1` = -1 you skip the loop altogether and you don't need an extra compare. So it's clever and faster — Falco, Feb 04 '15 at 10:59

Maroun · Accepted Answer · 2015-02-03T09:27:00.900

12

Right shifting by one means dividing by two, I don't think you'll notice any performance difference, the compiler will perform these optimization at compile time.

Many programmers are used to right shift by two when dividing instead of writing / 2, it's a matter of style, or maybe one day it was really more efficient to right shift instead of actually dividing by writing / 2, (prior to optimizations). Compilers know how to optimize things like that, I wouldn't waste my time by trying to write things that might be unclear to other programmers (unless they really make difference). Anyway, the loop is equivalent to:

int n = count - 1;
for (int j = (n-1) / 2; j >= 0; --j)

As @MarkoTopolnik mentioned in his comment, JDK was written without considering any optimization at all, this might explain why they explicitly right shifted the number by one instead of explicitly dividing it, if they considered the maximum power of the optimization, they would probably have wrote / 2.

Just in case you're wondering why they are equivalent, the best explanation is by example, consider the number 32. Assuming 8 bits, its binary representation is:

00100000

right shift it by one:

00010000

which has the value 16 (1 * 2⁴)

edited Feb 03 '15 at 09:27

answered Feb 03 '15 at 08:59

Maroun

94,125
30
188
241

6

JDK code is generally written with the least possible assumptions on the power of the runtime optimizer. This is a fully justified choice because the JDK, unlike any client code we write, targets all kinds of platforms, including the most minimalist ones. Performance of interpreted code is also of concern there. – Marko Topolnik Feb 03 '15 at 09:09
@MarounMaroun the question was WHY the bitwise operators are faster, not why they are equivalent to mathematical operations. – Malt Feb 03 '15 at 09:11
@MarkoTopolnik this might explain right shifting, if they considered the maximum power of the optimization, they would probably have wrote `/ 2`. – Maroun Feb 03 '15 at 09:11
2

@MarounMaroun Yes, just look at Doug Lea's code in `java.util.concurrent`---it's full of seemingly unnecessary optimizations, when considered against HotSpot at full power. – Marko Topolnik Feb 03 '15 at 09:13
1

Thanks Maroun, That is what I a exactly wanted to know. Thanks for such a prompt response. :) – Sudhanshu Sharma Feb 03 '15 at 09:21
2

It’s worth noting that right-shifting is only appropriate for *non-negative* numbers so an optimizer has to prove that the value can’t be negative or insert conditional code before it can replace a division with a right-shift. Hence, having a right-shift rather than a division in the source code *can* make a difference, even on today’s systems. While it’s clear that application writer should prefer readability over performance in non-critical code, the JRE developers can’t predict whether a method will become critical for an application. – Holger Feb 03 '15 at 13:28
1

If the compiler has no way to know that n is indeed positive at compile time, he cannot optimize into a simple right-shift! It would have to perfrom an arithmetic shift... And even then I don't know if Java has to use division to guarantee identical behaviour on 32/64 bit VMs... But: ( -7 >> 1 ) and (-7 / 2 ) produce different results in Java. So the compiler cannot optimize, and Shift is faster than full division! – Falco Feb 03 '15 at 15:31
@Maroun Irrelevant. (n-1) is a signed non-const integer and therefore can be negative. As such, divide by 2 and right shift are different operations that cannot be substituted for each other. – Peter Feb 03 '15 at 17:37
@Peter It *can* be negative according to rules of Java, but according to code logic, it can **never** be. – Maroun Feb 04 '15 at 08:37
@MarounMaroun again for you: What you divide is not count, but if you look at the code carefully (n-1) and n = count-1, so you divide (count - 2). And since count can be 1 or zero, (count-2) can be -1 or -2 – Falco Feb 04 '15 at 10:14

James · Answer 2 · 2015-02-04T14:38:17.353

In summary:

The >> operator in Java is known as the Sign Extended Right Bit Shift operator.
X >> 1 is mathematically equivalent to X / 2, for all strictly positive value of X.
X >> 1 is always faster than X / 2, in a ratio of roughly 1:16, though the difference might turn out to be much less significant in actual benchmark due to modern processor architecture.
All mainstream JVMs can correctly perform such optimizations, but the non-optimized byte code will be executed in interpreted mode thousand of times before these optimization actually occurs.
The JRE source code use a lot of optimization idioms, because they make an important difference on code executed in interpreted mode (and most importantly, at the JVM launch time).
The systematic use of proven-to-be-effective code optimization idioms that are accepted by a whole development team is not premature optimization.

Long answer

The following discussion try to correctly address all questions and doubts that have been issued in other comments on this page. It is so long because I felt that it was necesary to put emphasis on why some approach are better, rather than show off personal benchmark results, beliefs and practice, where millage might significantly vary from one person to the next.

So let's take questions one at a time.

1. What means X >> 1 (or X << 1, or X >>> 1) in Java?

The >>, << and >>> are collectively known as the Bit Shift operators. >> is commonly known as Sign Extended Right Bit Shift, or Arithmetic Right Bit Shift. >>> is the Non-Sign Extended Right Bit Shift (also known as Logical Right Bit Shift), and << is simply the Left Bit Shift (sign extension does not apply in that direction, so there is no need for logical and arithmetic variants).

Bit Shift operators are available (though with varying notation) in many programming language (actually, from a quick survey I would say, almost every languages that are more or less descendents of the C language, plus a few others). Bit Shifts are fundamental binary operations, and consquently, almost every CPU ever created offer assembly instructions for these. Bit Shifters are also a classic buiding block in electronic design, which, given a reasonable number of transitors, provide its final result in a single step, with a constant and predicatable stabilization period time.

Concretly, a bit shift operator transforms a number by moving all of its bits by n positions, either left or right. Bits that falls out are forgotten; bits that "comes in" are forced to 0, except in the case of the sign extended right bit shift, in which the left-most bit preserve its value (and therefore its sign). See Wikipedia for some graphic of this.

2. Does X >> 1 equals to X / 2?

Yes, as long as the dividend is guaranteed to be positive.

More generally:

a left shift by N is equivalent to a multiplication by 2^N;
a logical right shift by N is equivalent to an unsigned integer division by 2^N;
an arithmetic right shift by N is equivalent to a non-integer division by 2^N, rounded to integer toward negative infinity (which is also equivalent to a signed integer division by 2^N for any strictly positive integer).

3. Is bit shifting faster than the equivalent artihemtic operation, at the CPU level?

Yes, it is.

First of all, we can easily assert that, at the CPU's level, bit shifting does require less work than the equivalent arithmetic operation. This is true both for multiplications and divisions, and the reason for this is simple: both integer multiplication and integer division circuitry themselves contains several bit shifters. Put otherwise: a bit shift unit represents a mere fraction of the complexity level of a multiplication or division unit. It is therefore guaranteed that less energy is required to perform a simple bit shift rather than a full arithmetic operation. Yet, in the end, unless you monitor your CPU's electric consumption or heat dissipation, I doubt that you might notice the fact that your CPU is using more energy.

Now, lets talk about speed. On processors with reasonnably simple architecture (that is roughly, any processor designed before the Pentium or the PowerPC, plus most recent processors that do not feature some form of execution pipelines), integer division (and multiplication, to a lesser degree) is generally implemented by iterating over bits (actually group of bits, known as radix) on one of the operand. Each iteration require one CPU cycle, which means that integer division on a 32 bits processor would require (at most) 16 cycles (assuming a Radix 2 SRT division unit, on an hypothetical processor). Multiplication units usually handle more bits at once, so a 32 bits processor might complete integer multiplication in 4 to 8 cycles. These units might use some form of variable bit shifter to quickly jump over sequence of consecutive zeros, and therefore might terminate quickly when multiplying or dividing by simple operands (such as positive power of two); in that case, the arithmetic operation will complete in less cycles, but will still require more than a simple bit shift operation.

Obviously, instruction timing vary between processor designs, but the preceeding ratio (bit shift = 1, multiplication = 4, division = 16) is a reasonable approximation of actual performance of these instructions. For reference, on the Intel 486, the SHR, IMUL and IDIV instructions (for 32 bits, assuming register by a constant) required respectively 2, 13-42 and 43 cycles (see here for a list of 486 instructions with their timing).

What about CPUs found in modern computers? These processors are designed around pipeline architectures that allow the simultaneous execution of several instructions; the result is that most instructions nowaday require only one cycle of dedicated time. But this is misleading, since instructions actually remains in the pipeline for several cycles before being released, during which they might prevent other instructions from being completed. The integer multiplication or division unit remains "reserved" during that time and therefore any further division will be hold back. That is particularly a problem in short loops, where a single mutliplication or division will end up being stalled by the previous invocation of itself that hasn't yet completed. Bit shift instructions do not suffer from such risk: most "complex" processors have access to several bit shift units, and don't need to reserve them for very long (though generally at least 2 cycles for reasons intrinsic to the pipeline architecture). Actually, to put this into numbers, a quick look at the Intel Optimization Reference Manual for the Atom seems to indicates that SHR, IMUL and IDIV (same parameter as above) respectively have a 2, 5 and 57 latency cycles; for 64 bits operands, it is 8, 14 and 197 cycles. Similar latency applies to most recent Intel processors.

So, yes, bit shifting is faster than the equivalent arithmetic operations, even though in some situations, on modern processors, it might actualy makes absolutely no difference. But in most case, it is very significant.

4. Will the Java Virtual Machine will perform such optimization for me?

Sure, it will. Well... most certainly, and... eventually.

Unlike most language compilers, regular Java compilers perform no optimization. It is considered that the Java Virtual Machine is in best position to decide how to optimize a program for a specific execution context. And this indeed provide good results in practice. The JIT compiler acquire very deep understanding of the code's dynamics, and exploit this knowledge to select and apply tons of minor code transforms, in order to produce a very efficient native code.

But compiling byte code into optimized native methods require a lot of time and memory. That is why the JVM will not even consider optimizing a code block before it has been executed thousands of times. Then, even though the code block has been scheduled for optimization, it might be a long time before the compiler thread actualy process that method. And later, various conditions might cause that optimized code block to be discarded, reverting back to byte code interpretation.

Though the JSE API is designed with the objective of being implementable by various vendor, it is incorrect to claim that so is the JRE. The Oracle JRE is provided to other everyone as the reference implementation, but its usage with another JVM is discouraged (actualy, it was forbiden not so long ago, before Oracle open sourced the JRE's source code).

Optimizations in the JRE source code are the result of adopted conventions and optimization efforts among JRE developpers to provide reasonable performances even in situations where JIT optimizations haven't yet or simply can't help. For example, hundreds of classes are loaded before your main method is invoked. That early, the JIT compiler has not yet acquired sufficient information to properly optimize code. At such time, hand made optimizations makes an important difference.

5. Ain't this is premature optimization?

It is, unless there is a reason why it is not.

It is a fact of modern life that whenever a programmer demonstrate a code optimization somewhere, another programmer will oppose Donald Knuth's quote on optimization (well, was it his? who knows...) It is even perceived by many as the clear assertion by Knuth that we should never try to optimize code. Unfortunately, that is a major misunderstanding of Knuth's important contributions to computer science in the last decades: Knuth as actually authored thousand of pages of literacy on practical code optimization.

As Knuth put it:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

— Donald E. Knuth, "Structured Programming with Goto Statements"

What Knuth qualify as premature optimization are optimizations that require lot of thinking and apply only to non critical part of a program, and have strong negative impact on debugging and maintenance. Now, all of this could be debated for a long time, but let's not.

It should however be understood that small local optimizations, that have been proven to be effective (that is, at least in average, on the overall), that do not negatively affect the overall construction of a program, do not reduce a code's maintainability, and do not require extraneous thinking are not a bad thing at all. Such optimizations are actualy good, since they cost you nothing, and we should not pass up such opportunities.

Yet, and that is the most important thing to remember, an optimization that would be trivial to programers in one context might turn out to be incomprenhendable to programmers in another context. Bit shifting and masking idioms are particularly problematic for that reason. Programmers that do know the idiom can read it and use it without much thinking, and the effectiveness of these optimizations is proven, though generaly insignificant unless the code contains hundreds of occurences. These idioms are rarely an actual source of bugs. Still, programmers unfamilliar with a specific idiom will loose time understanding what, why and how that specific code snippet does.

In the end, either to favor such optimization or not, and exactly which idioms should be used is really a matter of team decision and code context. I personnaly consider a certain number of idioms to be best practice in all situations, and any new programmer joining my team quickly acquire these. Many more idioms are reserved to critical code path. All code put into internal shared code library are treated as critical code path, since they might turns out to be invoked from such critical code path. Anyway, that is my personal practice, and your millage may vary.

Thank you for an in-depth reasonable answer, which addresses all the problems from discussion on the other answers. I think if you add a TLDR with the essence this is perfect! — Falco, Feb 04 '15 at 10:44

score 4 · Answer 3 · answered Feb 03 '15 at 08:58

4

It uses (n-1) >> 1 instead of (n-1)/2 to find the middle index of the internal array to be reversed. Bitwise shift operators are usually more efficient than the division operator.

answered Feb 03 '15 at 08:58

Eran

387,369
54
702
768

3

I would like to see some benchmarks supporting your claim – Mateusz Dymczyk Feb 03 '15 at 08:59
1

That's not really a random claim, it's fairly common. – Evan Knowles Feb 03 '15 at 09:00
2

@EvanKnowles then giving me pointers to some benchmarks should be fairly straightforward. – Mateusz Dymczyk Feb 03 '15 at 09:01
5

it comes from "the execution time" of CPU instructions for division (tens of ticks) and bit shiffting (1 tick) – sodik Feb 03 '15 at 09:01
1

@sodik first of all that's platform dependant, second of all the VM has enough information to optimize division into a shift – Mateusz Dymczyk Feb 03 '15 at 09:02
1

I don't think that there will be any difference, except for making the code less readable. Even if `>>1` is faster the JIT most probably will optimize `/2` to right shift. – Svetlin Zarev Feb 03 '15 at 09:03
1

it can depend on platform but we can agree that `>>` will be faster :) but here is simple benchmark I made https://gist.github.com/sodik82/d7525a8a20e8855461b3 and results from my machine is that shift is twice as fast. "Times: 9775464135 and 4883906283" – sodik Feb 03 '15 at 09:16
well, it;s not a `jmh` microbenchmark, so it's invalid :) http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – Svetlin Zarev Feb 03 '15 at 09:22
why are bitwise shift operators more efficient than the division operator ? – Mateen Feb 03 '15 at 09:28
2

@SvetlinZarev Here it is on JMH: https://gist.github.com/mtopolnik/465ad156c68816ea5c91 The results are identical. – Marko Topolnik Feb 03 '15 at 09:38
1

@mateen There is hardly a simpler task for the CPU than just shifting bits of the number's binary representation. General integer division, on the other hand, requires running a complete multistep algorithm. As the CPUs gain transistor count, the actual hardware dedicated to these operations gets ever more complex and optimized, but division has still not reached the 1-cycle-per-operation mark (AFAIK). Shifting, on the other hand, is probably a sub-1-cycle operation. – Marko Topolnik Feb 03 '15 at 10:02
I'll add to the conversation that in embedded systems, many times there are no hardware for float point and integer division calculations, which means these calculation are done in software, i.e. extremely slow. And it's not a bygone era, either. Read this http://community.arm.com/docs/DOC-8059 and you'll realize a very large number of quad-core 2.x GHz Android devices do not have hardware integer division unit. – Kai Feb 03 '15 at 14:36
@sodik when I change the argument in both methods from 1000000 to 10 then in most cases the division method is faster on my machine. Your benchmark furthermore doesn't allow full optimization as the JVM would have to make some adjustments when changing division into shift on negative numbers. If you instead perform the shift/division always on a positive number then the time difference will be much smaller. – Mateusz Dymczyk Feb 03 '15 at 14:46
@MateuszDymczyk can you provide a Benchmark to support your claims? - The compiler will not know if a dynamic value can be negative. And indeed in StringBuilder length can be zero, so n can be -1 and you divide a negative number.... Automatic optimization would change the behaviour - so it would be wrong to replace / with >> – Falco Feb 03 '15 at 15:36
@Mateusz There is no way for bitshift to be slower than division on any device that is used in the real world (except division by 1). The bitshift operation is so trivial to implement that it is impossible to have a faster division algorithm. The same is not true the other way around. Also, bitshift by 1 and division by 2 are not equivalent for signed integers, they return different results when applied to negative numbers. – Peter Feb 03 '15 at 17:47
@Peter I just took sodik's "benchmark" and changed the input value to 10. Not saying division is faster then shift, just pointing out that the benchmark isn't too reliable – Mateusz Dymczyk Feb 04 '15 at 01:00
@MateuszDymczyk Yes the JVM will most certainly automatically perform such optimization, but the unoptimized code will have been run thousands of times before it actually happens. That is why hand made optimization in the JRE are imortant. And this is also why *correctly designed* microbenchmark will not correctly demonstrate this benefit: once JIT kicks in, the executed code will actually be the same in both cases. It is wrong to assume that microbencharks should *always* test post-JIT performance. Refer to my answer for a more detailed explanation on *why* bit shift is *always* faster, *why* – James Feb 04 '15 at 17:06

score 2 · Answer 4 · edited May 23 '17 at 11:43

In this method, there's just this expression: (n-1) >> 1. I assume this is the expression you're referring to. This is called right shift/shifting. It is equivalent to (n-1)/2 but it's generally considered faster, more efficient. It's used often in many other languages too (e.g. in C/C++).

Note though that modern compilers will optimize your code anyway even if you use division like (n-1)/2. So there are no obvious benefits of using right shifting. It's more like a question of coding preference, style, habit.

Bitwise operator advantages in StringBuilder

4 Answers4