I have look at GMPs source code and It seems to me that it uses hand written assembly code to achieve fast peformance in its mpz_*
/mpn_*
arithmetic implementations, but when I look at boost's source code, I don't see any hand written assembly code (though I saw some intrinsics but its very few so I doubt that boost rely on intrinsics entirely?).
I also know that we can use GMP as boost's backend, but by default as far as I know this is not the case.
So my questions are:
how does boost implement arithmetic operations for it's
cpp_int
and other big number types?how does boost make its big number implementations fast without using hand written assembly code or intrinsics(?)? (all I can see are templates everywhere)
what technique are they using?
[EDIT]
I asked this question because I have come across many community implementations of big integers written in just pure C++, and the fastest performance I have encountered are only 3x to 4x the speed of GMPs implementation specially in the four basic arithmeric operations, yet boost on the other hand can achieve almost 1.1x to 1.3x the speed of GMP (statically linked), so I'm puzzled on how boost achieved this without hand written assembly code.
I tried to understand boost source code but it's too complicated and convoluted at least for me.
[EDIT]
In most of the benchmarks that I did, I have used a templated looped base fibonacci and factorials functions that I have implemented on my own, and I mainly passed 4 digit numbers as their argument, so for the factorials, I assume that it only uses the naive multiplication algorithm since the multiplier will always be a one limb big integer, for the fibonacci I don't know of an algorithm that is faster than the naive addition algorithm, so how does boost acheived this fast naive implementations of addition and multiplication without hand written assembly?
Also these open source community implementations of big integers written in pure C++ that I have tested with boost and gmp, are also using limb bases of 264 or 232 (at least the fastest ones), so I don't think so that it is a matter of big integer representation.