How is exponentiation by squaring faster?

Question

Suppose you want to calculate 5^65537 instead of multiplying 5 65537 times, it is recommended to do ((5^2)^16)*5. This results in 16 times squaring and one multiplication.
But my question is aren't you not compensating the the number of squaring times by squaring very large numbers? how is this faster when you go down to the basic bit multiplication in computers.
After reading the comments, I have got this doubt:

     How is the cost of each multiplication not dependant on the size. because when
multiplying the number of bits of the multiplier will increase and this will increase the 
number of additions and the number of left shifts.

You do know that crypto doesn't *actually* calculate that, right? — Ignacio Vazquez-Abrams, Apr 12 '12 at 05:14
@IgnacioVazquez-Abrams This calculation must be done somewhere right? and it will affect performance. — suraj, Apr 12 '12 at 06:10
@IgnacioVazquez-Abrams Dude then please tell me how this calculation thing happen? Dont tell me it just happens. Somewhere it must happen boss. — suraj, Apr 12 '12 at 06:14
@suraj: No, [it really doesn't](http://en.wikipedia.org/wiki/Modular_exponentiation). — Ignacio Vazquez-Abrams, Apr 12 '12 at 06:15
@IgnacioVazquez-Abrams I agree with what Ashwin said. In that case you have to calculate — suraj, Apr 12 '12 at 06:53
@IgnacioVazquez-Abrams I agree with the following statement from Ashwin "Some times in rsa cryptography, the Modulus is so large that all the numbers after exponentiaion are still within the modulus. In that case you will have to do the calculation" — suraj, Apr 12 '12 at 06:59
@suraj: You don't even know what he's talking about. You weren't even *aware* of the operation until I brought it to your attention. — Ignacio Vazquez-Abrams, Apr 12 '12 at 07:00
Then you do the calculation. BFD. Arbitrary-precision libraries are a dime a dozen. — Ignacio Vazquez-Abrams, Apr 12 '12 at 07:05
There's *always* a need to calculate. But the modulus provides an upper limit to the space required. — Ignacio Vazquez-Abrams, Apr 12 '12 at 07:08
@IgnacioVazquez-Abrams Better you mind your words. I know what he is talking about and that is why I am commenting here. I think you dont want to give up your ego and accept your faults - "You do know that crypto doesn't actually calculate that, right?" — suraj, Apr 12 '12 at 07:15

Oleksi · Accepted Answer · 2012-04-12T05:30:40.527

12

Count the multiplication operations:

5^65537 = 65537 multiplications
((5^2)^16)*5 = (2 + 16 + 1) = 19 multiplications.

From this, you can see that this is much less work, despite the multiplications working on larger numbers. The algorithm is called Square and Multiply.

In practice, cryptosytems that need to calculate large numbers like this use a technique called Modular Exponentiation to avoid massive intermediate numbers.

edited Apr 12 '12 at 05:30

answered Apr 12 '12 at 05:14

Oleksi

12,947
4
56
80

@Oleski : So the multiplications on larger numbers are not a set back -how? – Ashwin Apr 12 '12 at 05:22
Keep in mind that the many-multiplication version still deals with huge intermediate results. I've updated the answer to talk about how systems really deal with large numbers in practice, though – Oleksi Apr 12 '12 at 05:29
thank you for that link. but can you explain in a non cryptographic scenario, how small number for multiplications with large numbers is better that multiplying small numbers many times. Does it go don to the basic bit muliplications involving the registers and accumulators? please explain if possible. – Ashwin Apr 12 '12 at 05:34
@Ashwin: _Count_ the number of times you have to multiply things, and remember that the cost of each multiplication is about the same, regardless of what is being multiplied. – Donal Fellows Apr 12 '12 at 05:36
1

While it's true that multiplying _huge_ numbers is slower than multiplying small numbers, this is still much faster than performing many, smaller, multiplications. I don't really know why this is the case from a hardware point of view, but I suspect that CPU are designed so that they can multiply larger numbers without using extra clock cycles, however multiple multiplications will always use multiple clock cycles. – Oleksi Apr 12 '12 at 05:41
@DonalFellows : That is what my question is. How is the cost of each multiplication not dependant on the size. because when multiplying the number of bits of the multiplier will increase and this will increase the number of additions and the number of left shifts. – Ashwin Apr 12 '12 at 05:51
@Oleksi : Are you sure of the multiple clock cycles concept? – Ashwin Apr 12 '12 at 05:58
@Ashwin, No, it's just my theory. Like I said, I'm not sure about the hardware reasons for the performance. – Oleksi Apr 12 '12 at 06:01
@Oleksi: If the size of the numbers fits within the word length of the CPU then the delay is independent of the values. But We are talking about cryptography with large numbers. Then you save the numbers as arrays of words. The smaller a number (in the sense of words) the less operations have to be performed. Example: multiplying two 16bit numbers on a 8bit machine requires 4 8-bit multiplications. – mrks Apr 12 '12 at 08:16
The whole concept of clock cycle is rather dodgy; the issue is that real execution speeds depend massively on cache behavior. L1 caches are now nearly as fast as registers, and things that appear expensive are often not. – Donal Fellows Apr 12 '12 at 19:19
That said, doing things repeatedly is a good way to make them cost more and reducing loop counts is a good idea provided you can still keep everything from overflowing to the next level of cache out. With bignum multiplication, the costly bit is increasing the space allocated to them (because that's unlikely to be cached); that's a pretty rare thing with sane amortization, and so costs are simply in the number of times round the loop. – Donal Fellows Apr 12 '12 at 19:37
Determining the optimal multiplication pattern is hard though (NP-complete). There's rather a lot more on this at http://rosettacode.org/wiki/Addition-chain_exponentiation and the pages it links to… – Donal Fellows Apr 12 '12 at 19:39
@DonalFellows : you have said "With bignum multiplication, the costly bit is increasing the space allocated to them (because that's unlikely to be cached)". So that means t big numbers will be stored somewhere else right? like in a higher level of cache- which is slower. – Ashwin Apr 13 '12 at 02:22
@markus : sorry for my previous comment. I had not read your comment properly, so I have removed that comment. In cryptography when dealing with large numbers(take the case where modulur exponentiation cannot be applied because the modulus itself is too large), the number as you said will be stored in an array of words. So overall time will increase considerably right? – Ashwin Apr 13 '12 at 02:28
In cryptography, fixed size arrays can be used. _Depending on the key size_, that might mean that everything can be done in L1 cache and the calculations will be quick. (How big would that be? No idea! Depends on loads of factors; modern hardware is very complex, as are modern operating systems.) – Donal Fellows Apr 14 '12 at 17:57
The slowest operations on a modern computer tend to be I/O (networking and disks). Then things get faster for general memory (SSDs are between disks and general memory in speed). Then faster again for the various types of cache (L1 is fastest, but smallest). Fastest of the lot are the CPU registers, but there aren't many of those at all. – Donal Fellows Apr 14 '12 at 18:01
That said, with experience I think of performance like this: kiss it goodbye if you go to disk (including swapping/paging), be aware of it when dealing with general memory, don't worry much if everything's nicely in L1 cache. Oh, and beware of nested loops of course; they can turn something cheap into something expensive. The other rule is to not speculate about performance when you can measure it instead; facts trump opinion. – Donal Fellows Apr 14 '12 at 18:06
@DonalFellows : thanks for your answer:) please just leave a comment if you come to know anything about - during exponentiation all the data is stored in the l1 cache or the actual reason of why few large multiplications are better that many small multiplications. – Ashwin Apr 19 '12 at 01:38
@Oleksi : thanks for your answer:) please just leave a comment if you come to know anything about clock cycles concept - or the actual reason of why few large multiplications are better that many small multiplications. I will do the same:) – Ashwin Apr 19 '12 at 01:39
@Oleski Your example is wrong. The formula says x^n = x * (x^2)^n-1/2 if n is odd and in this case n = 65537, which is odd so your example should say ( 5^2)^32768 * 5 = 32771 multiplications which is fewer than 65537. that is why exponentiation by squaring is faster. – Droid Teahouse Apr 01 '16 at 05:54

How is exponentiation by squaring faster?

1 Answers1