4

I recently posted an answer where I suggested using bitshift instead of multiplication as a performance boost. It was pointed out to me that this isn't the case with the following example:

from timeit import repeat
for e in ['x*2 ', 'x<<1'] * 3:
    print(e, min(repeat(e, 'x=5')))

x*2  0.015567475988063961
x<<1 0.024531989998649806
x*2  0.01551242297864519
x<<1 0.024578287004260346
x*2  0.015560572996037081
x<<1 0.02448918900336139

This is the case for values of x up to 1,000,000,000. Note that this value decreases as the value by which x is being multiplied/bit-shifted increases as well.

This doesn't make sense to me as bitshift is objectively a simpler and faster operation. And we can see this as the bitshift speeds up as x grows. So, why is it slower for smaller values of x?

Moreover, changing my code up a bit yielded some interesting results:

for e in ['x*2 ', 'x<<1'] * 3:
     print(e, max(repeat(e, 'x=5')))
 
x*2  0.054458492988487706
x<<1 0.02453691599657759
x*2  0.015550968993920833
x<<1 0.0246038619952742
x*2  0.015542584005743265
x<<1 0.024583352991612628

As we can see from this, the usage of multiplication ran much slower than bitshift on the initial pass, although all subsequent usages had comparable times. It looks like there's some sort of caching operation going on here but I can't fathom why that would result in different runtimes.

Woody1193
  • 7,252
  • 5
  • 40
  • 90
  • @SilvioMayolo repeat from the [timeit](https://docs.python.org/3/library/timeit.html#python-interface) package. – Woody1193 Dec 15 '22 at 04:59
  • both `mul` and `shl` are single cpu cycle instructions, so it would surprise me if one was faster than the other, unless we're dealing with floats that don't fit in the registers? – Mike 'Pomax' Kamermans Dec 15 '22 at 05:01
  • @Mike'Pomax'Kamermans That surprises me as every implementation for multiplication I've seen requires more complicated logic. To be fair, I learned this stuff on the precursor to Intel's first 8086 so my knowledge might be out of date here, but I remember multiplication being a 3-cycle instruction whereas bitshift was a single-cycle instruction. Is it just that the cycle times for individual transistors has decreased fast enough that they can fit a more complex operation into a single cycle? – Woody1193 Dec 15 '22 at 05:05
  • 1
    Not to be flip, but it would be surprising to me if facts about bitwise operations on raw binary data translated into actionable insights about python objects. You may be overlooking the overhead inherent in the language's decisions about representation of integers. – Jon Kiparsky Dec 15 '22 at 05:06
  • 1
    @JonKiparsky I assume I am, ergo my question. What strikes me as strange is that multiplication is faster for smaller input values but bitshift is faster for larger values. – Woody1193 Dec 15 '22 at 05:06
  • 3
    It looks like Python's bytecode interpreter has a special optimization to handle multiplication of two small integers, which is undoubtedly a more common operation than bit shifting. – jasonharper Dec 15 '22 at 05:08
  • @jasonharper That appears to be the case, vis a vis my update to the question. – Woody1193 Dec 15 '22 at 05:09

0 Answers0