3

I know, I know, people are probably going to say "just switch to floating point", but currently that is not an option due to the nature of the project that I am working on. I am helping write a programming language in C++ and I am currently having difficulty trying to get a very accurate algorithm for multiplication whereby I have a VM and mainly the operations for mod/smod, div/sdiv (ie signed numbers are not a concern here), mul, a halving number for fully fractional numbers and a pushed shift number that I multiply and divide by to create my shifting. For simplicity, lets say I'm working with a 32 byte space. My algorithms work fine for pretty much anything involving integers, it's just that when my fractional portion gets over 16 bytes that I run into problems with precision, and if I were to round it, the number would be fairly accurate, but I want it as accurate as possible, even willing to sacrifice a tad in performance for it, so long as it stays a fixed point and doesn't go into floating point land. The algorithms I'm concerned with I will map out in a sort of pseudocode. Would love any insight into how I could make this better, or any reasoning as to why by the laws of computational science, what I'm asking for is a fruitless endeavor.

For fully fractional numbers (all bytes are fractional):

 A = num1 / halfShift //truncates the number down to 16 so that when multiplied, we get a full 32 byte num
 B = num2 / halfShift
 finalNum = A * B

For the rest of my numbers that are larger than 16 bytes I use this algorithm:

 this algorithm can essentially be broken down into the int.frac form
 essentially A.B * C.D taking the mathematic form of
 D*B/shift + C*A*shift + D*A + C*B
 if the fractional numbers are larger than the integer, I halve them, then multiply them together in my D*B/shift
 just like in the fully fractional example above

Is there some kind of "magic" rounding method that I should be aware of? Please let me know.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Algorithm. The word is 'algorithm'. It's somebody's name. – user207421 Jul 08 '16 at 04:21
  • 1
    `A = num1 / halfShift //truncates the number down to 16` - once you reduced the "precision" (resolution, really) of the input (to 16 here, and the unit (bytes/digits/bits/words…) matters even less), _no_ amount of accurate arithmetic can restore/increase it. Instead, choose how many "guard places" you are going to require (say, 2) and compute to the chosen precision (32+2=34 in this case). In case of multiplication, this allows to drop almost half of the partial products. Round to final precision. – greybeard Jul 08 '16 at 05:22
  • @greybeard what do you mean by "guard places"? – Richard John Catalano Jul 08 '16 at 18:12
  • 1
    You could have been bothered to use [wikipedia on guard digits](https://en.wikipedia.org/wiki/Guard_digit) or a search engine to find [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). (I avoid _guard bits_ or _digits_ so as not to imply a base. The naming isn't standardised, anyway: one author's _rounding bit_ is another author's first _guard bit_.)(If you expect to use floating point representations to produce results for real problems, by all means digest the "Goldberg paper" - FP _is_ evil magic.) – greybeard Jul 08 '16 at 20:43
  • appreciate the resources. Really I do. – Richard John Catalano Jul 08 '16 at 20:52
  • 1
    (You're welcome (to actually take hints:-). If you want someone that didn't post the question (or, in case of answers, the answer) notified, mention her name introduced by `@` (you will get suggestions).) – greybeard Jul 09 '16 at 07:37

2 Answers2

2

The number of fractional digits of a product equals the sum of the numbers of fractional digits in the operands. You have to carry out the multiplication to that precision and then round or truncate according to the desired target precision.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • But what is a good method for rounding I guess? I'm very new to all of this and I'm still trying to understand what makes a good rounding algo. If you could just answer that, I'll give you the answer. – Richard John Catalano Jul 08 '16 at 20:53
  • 2
    I would recommend rounding to nearest. Should work by just adding the highest discarded bit as lowest undiscarded one. – Aconcagua Jul 09 '16 at 05:19
2

You get the most accurate result if you do the multiplication first and scale afterwards. Of course that means, that you need to store the result of the multiplication in a 64-bit int type. If that is not an option, your approach with shifting in advance makes sense. But you certainly loose precision.

Either way, you can increase accuracy a little if you round instead of truncate.

I support Aconcagua's recommendation to round to nearest. For that you need to add the highest bit which is going to be truncated before you apply the division.

In your case that would look like this:

A = (num1 + 1<<(halfshift-1)) >> halfshift 
B = (num2 + 1<<(halfshift-1)) >> halfshift
finalNum = A * B

EDIT:

Example on how to dynamically scale the factors and the result depending on the values of the factors (this improves resolution and therefore the accuracy of the result):

shiftA and shiftB need to be set such that A and B are 16 byte fractionals each and therefore the 32 byte result cannot overflow. If shiftA and shiftB is not known in advance, it can be determined by counting the leading zeros of num1 and num2.

A = (num1 + 1<<(shiftA-1)) >> shiftA
B = (num2 + 1<<(shiftB-1)) >> shiftB
finalNum = (A * B) >> (fullshift - (shiftA + shiftB))
maniacmic
  • 414
  • 8
  • 15
  • Well, I do have the ability to multiply first and store it somewhere...although not really in a 64-bit int type...much larger actually and I'm working with a default big endian 256-bit int type. However, I cannot store anything beyond those 256 bits, so if that's what you're recommending, I'm not sure that will work. – Richard John Catalano Jul 15 '16 at 01:25
  • It must have mistaken bits with bytes when I was reading your question. If you cannot store results larger than 32 bytes, you need to stick to 16 byte values as factors. – maniacmic Jul 18 '16 at 07:17
  • 1
    Btw. I forgot to mention, that you could also adapt the scaling factor for every multiplication such that the result does not overflow and then apply the remaining required scaling on the result. Depending on the numbers this can increase the resolution quite a lot. Probably even more than the one bit you get with proper rounding. – maniacmic Jul 18 '16 at 07:31
  • what does that look like when you apply the remaining required scaling on the result? – Richard John Catalano Jul 19 '16 at 06:36
  • 1
    @RichardJohnCatalano: I have updated my answer to include an example on dynamically scaled factors and product. – maniacmic Jul 19 '16 at 07:43