How was boost cpp_int arithmetic operations were implemented?

Question

I have look at GMPs source code and It seems to me that it uses hand written assembly code to achieve fast peformance in its mpz_*/mpn_* arithmetic implementations, but when I look at boost's source code, I don't see any hand written assembly code (though I saw some intrinsics but its very few so I doubt that boost rely on intrinsics entirely?).

I also know that we can use GMP as boost's backend, but by default as far as I know this is not the case.

So my questions are:

how does boost implement arithmetic operations for it's cpp_int and other big number types?
how does boost make its big number implementations fast without using hand written assembly code or intrinsics(?)? (all I can see are templates everywhere)
what technique are they using?

[EDIT]

I asked this question because I have come across many community implementations of big integers written in just pure C++, and the fastest performance I have encountered are only 3x to 4x the speed of GMPs implementation specially in the four basic arithmeric operations, yet boost on the other hand can achieve almost 1.1x to 1.3x the speed of GMP (statically linked), so I'm puzzled on how boost achieved this without hand written assembly code.

I tried to understand boost source code but it's too complicated and convoluted at least for me.

[EDIT]

In most of the benchmarks that I did, I have used a templated looped base fibonacci and factorials functions that I have implemented on my own, and I mainly passed 4 digit numbers as their argument, so for the factorials, I assume that it only uses the naive multiplication algorithm since the multiplier will always be a one limb big integer, for the fibonacci I don't know of an algorithm that is faster than the naive addition algorithm, so how does boost acheived this fast naive implementations of addition and multiplication without hand written assembly?

Also these open source community implementations of big integers written in pure C++ that I have tested with boost and gmp, are also using limb bases of 2⁶⁴ or 2³² (at least the fastest ones), so I don't think so that it is a matter of big integer representation.

They are probably using the same or similar algorithms. Just that one library decided to implement the algorithms in assembly and the other in C++. So I am not sure what you looking for in an answer. It is non-obvious whether letting the compiler optimize the C++ code or hand-writing it would result in faster code in any given instance, depending on many factors. It seems to me that you are somehow making the assumption that C++ must be (significantly?) slower than hand-written assembly? — user17732522, Aug 23 '22 at 17:13
Is there a particular part of the source code you don't understand? Have you studied it and the documentation? — Alan Birtles, Aug 23 '22 at 17:37
The x86 assembly language now has about 2000 instructions, with timing varying widely between processor generations. So to produce optimal code for a particular model, you really *have* to be a computer. One technique is to write C++, compile it, and check the result. If it doesn't look good, improve the templates. — BoP, Aug 23 '22 at 17:44
The question could be improved by giving a concrete [mre] of a benchmark in which you see this small difference between GMP/boost with an example of these other implementations you mention. As far as I can tell boost is not doing anything unusual that other C++ implementations couldn't also easily do. A 1.1x to 1.3x time difference between the hand-written assembly and compiled code seems more reasonable to me than 3x to 4x difference. — user17732522, Aug 23 '22 at 19:47
@user17732522 It would improve the question, but not make it on-topic. Boost is open-source, and the developers are usually reachable on the mailing list/github. — sehe, Aug 23 '22 at 20:17

How was boost cpp_int arithmetic operations were implemented?

0 Answers0