1

Lets say I have 2 variables.

x = 1  
y = 2  

The end result should be:

x = 2  
y = 1  

I thought about the following ways to do so:

temp = x // clone x
x = y
y = temp

or (XOR swap)

x = x XOR y
y = x XOR y
x = y XOR x

I'd like to get an answer regarding low level memory etc...
What is the fastest way to do so?

Note:
I would like to get a bonus answer, hypothetically, with no side effects (of the code, cpu), which is the fastest, or are there any other faster ones?

funerr
  • 7,212
  • 14
  • 81
  • 129
  • 2
    This is dependent on so many things that I doubt there is a definitive answer to this. – hatchet - done with SOverflow Nov 08 '13 at 20:01
  • 3
    Any answer to this without measurement is wrong. The answer also depends on surrounding code and on architecture. – usr Nov 08 '13 at 20:05
  • 1
    The fastest way to swap is by not swapping. Instead, swap your assignment of registers to the variables. So for example, if `x` is in `r0` and `y` is in `r1`, "after" the swap you say that `x` is in `r1` and `y` is in `r0`. That's not always an option, but when it is, it's unbeatable since it literally takes zero time, nothing actually happens, it's just the interpretation. – harold Nov 08 '13 at 20:14
  • It sounds like an XY question where he's trying to optimize. – ninMonkey Nov 08 '13 at 21:07

3 Answers3

6

The problem is that modern CPU architectures will not let you get this answer. They will hide many effects and will expose many very subtle effects.

If you have the values in CPU registers and you have a spare register, then the temp way is either the fastest way, or the way which consumes the least power.

Using the XOR or the +/- (very neat by the way!) method is for situations where you cannot afford to have an extra location (extra memory variable or extra register). This might seem strange but inside a C preprocessor macro one cannot (easily) declare new variables for example.

When the variables are in memory all variants are very likely to behave the same on any high performance CPU. Even if the compiler does not optimize the code, the CPU will avoid virtually all memory accesses and make them as fast as register accesses.

In total I am inclined to say: Don't worry about the speed of this. It is unimportant to optimize at this level. Try to avoid the swap altogether, this will be the fastest!

Johannes Overmann
  • 4,914
  • 22
  • 38
4

http://en.wikipedia.org/wiki/XOR_swap_algorithm

Most modern compilers can optimize away the temporary variable in the naive swap, in which case the naive swap uses the same amount of memory and the same number of registers as the XOR swap and is at least as fast, and often faster. The XOR swap is also much less readable and completely opaque to anyone unfamiliar with the technique. On modern CPU architectures, the XOR technique is considerably slower than using a temporary variable to do swapping. One reason is that modern CPUs strive to execute instructions in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order.

Also see this question:

How fast is std::swap for integer types?

It's important to note that the XOR swap requires that you first check that the two variables do not reference the same memory location. If they did, you would end up setting it to zero.

Community
  • 1
  • 1
  • 1
    The comparison operation for the XOR-swap will destroy all perf benefits in all situations I can think of. That's the death knell for this technique. – usr Nov 08 '13 at 20:28
  • Modern x86 CPUs perform register renaming, and I guess all the `mov` instructions will be removed at this moment without going to execution ports. So swap with temporary variable should be almost free for two variables in CPU registers. – stgatilov Sep 08 '15 at 12:18
1

XOR swap isn't always the most efficient, since most modern CPU architectures try and parallelize instructions but in the XOR swap, each line is dependent on the previous result (not parallelizable). For the temp variable swap, most compilers will optimize the temporary variable out which end up with the naive way running as fast or faster as well as using same amount of memory.

Another swap alternative is:

x = x + y
y = x - y
x = x - y

similarly, the arguments for efficiency and speed for the XOR swap apply here too.

EDIT: as hatchet said, the (+/-) approach also can cause overflow if not done carefully

Jamil Seaidoun
  • 949
  • 1
  • 11
  • 24