2

Assume this much:
I'm using a 16.16 fixed point system.
System is 32 bit.
CPU has no floating point processor.
Overflow is pretty imminent for multiplication for anything larger than 1.0 * 0.4999

To make one last assumption... lets say the values I'm working will not be so high as to cause overflow in this operation...

//assume that in practical application
//this assignment wouldn't be here as 2 fixed values would already exist...
fixed1 = (int)(1.2341 * 65536);
fixed2 = (int)(0.7854 * 65536);

mask1 = fixed1 & 0xFF; //mask off lower 8 bits

fixed1 >>= 8; //keep upper 24 bits... assume value here isn't too large...

answer = (((fixed2 * fixed1) >> 8) + ((fixed2 * mask1) >> 16));

So the question is... is this a stroke of genius (not to say it hasn't already been thought of or anything) or a complete waste of time?

Sandeep Datta
  • 28,607
  • 15
  • 70
  • 90
Maximus
  • 8,351
  • 3
  • 29
  • 37
  • 1
    Well I for one don't understand what you're trying to do here. Are you trying to multiply 1.2341 by 0.7854 on a 32bit machine that has no floating point processor but all numbers are fixed so that the top 16bits are pre-decimal point and the bottom 16 bits are post decimal point? – ChrisBD May 26 '10 at 15:26

1 Answers1

1

Re-edit - because I was wrong :)

Looks like you are trying to get higher precision by using an extra var?

If you are indeed trying to increase precision, then this would work, but why not use the whole int instead of just 8-bits?

Ok, from your comments, you wanted to know how to do 64-bit precision muls on a 32-bit processor. The easiest way is if the processor underneath you has a long multiply op. If it's an ARM, you are in luck and can use long long to do your mul then shift away your out of bounds low bits and be done.

If it does not, you can still do a long long multiply and let the compiler writer do the heavy lifting of handling overflow for you. These are the easiest methods.

Failing that, you get to do 4 16-bit multiplies and a bunch of adds and shifts:


// The idea is to break the 32-bit multiply into 4 16-bit 
parts to prevent any overflow.  You can break any 
multiply into factors and additions (all math here is unsigned):
      (ahi16)(alo16)
X     (bhi16)(blo16)
--------------------
      (blo16)(alo16)  - First  32-bit product var
  (blo16)(ahi16)&lt&lt16  - Second 32-bit product var (Don't shift here)
  (bhi16)(alo16)&lt&lt16  - Third  32-bit product var (Don't shift here)
+ (bhi16)(ahi16)&lt&lt32  - Forth  32-bit product var (Don't shift here)
--------------------
Final Value.  Here we add using add and add 
with carry techniques to allow overflow.

Basically, we have a low product and a high product The low product gets assigned the first partial product. You then add in the 2 middle products shifted up 16. For each overflow, you add 1 to the high product and continue. Then add the upper 16-bits of each middle product into the high product. Finally, add the last product as is into the high product.

A big pain in the butt, but it works for any abitrary precision of values.

Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61
  • This compiles and gives the expected answer, so no... the answer does not come out wrong. Yes, if the values are too large this would absolutely overflow. You might have missed fixed1 >>= 8... but this does produce a correct answer. – Maximus May 26 '10 at 15:45
  • Ah, you are correct, I missed that initial shift. That changes things completely and yes - this would increase your precision. But if you are going to spend time with an extra var, might as well take it further and use the whole var instead of just 8 bits. – Michael Dorgan May 26 '10 at 15:48
  • Thanks Michael. You're absolutely right that this could easily overflow... I'm pretty sketchy about using something like it. The overall question, to be honest... is assuming I DON'T overflow... would this be faster than float * float? – Maximus May 26 '10 at 15:52
  • I guess what I don't know then... how to store the initial multiplication of 2 32 bit values on a 32 bit OS without overflow? – Maximus May 26 '10 at 15:56
  • On a system without an FPU, way faster. I've not coded with an FPU for 10+years now and as such have become very used to fixed point math. How to do the initial mul with overflow? I'll edit my above post for that so I have room. – Michael Dorgan May 27 '10 at 17:24
  • Ah. I've read here and there that there would be faster ways to do it in ARM Assembly... I'm programming on Android (Native Java JNI) so I can't assume what the processor will be. On top of that... as much as I know about bitwise operations... I know nothing of Assembler... What I'm realizing for the bottom line, is that a 16.16 32 bit fixed point integer times a 16.16 32 bit integer could result in a 48 bit answer (after shifting back the 64 bit answer) and there must be some kind of controlled upper bound (no more than 24 bits each) for the initial values. Would that be correct to say? – Maximus Jun 03 '10 at 16:49
  • A 16.16 multiply results in a 32.32 answer. You can shift away the lowest 16 bits leaving the upper 16-bits if you wish. Most lower precision fixed point routines I've seen either partially pre-shift the terms before multiply and finish shifting afterwards to balance over/underflow. But it comes down to fitting the domain of your numbers to your terms. If you need all 16.16 bits of precision, then you need to do the full on fixed point multiply I describe above - perhaps minus the lowest 16-bit accumulation. – Michael Dorgan Jun 03 '10 at 17:00
  • BTW, what I mean by fitting your number to the domain is moving your fixed point position around to match your data. We usually use 24.8 number here for our work. That gives an upper bound in the millions while still allowing 2.75 decimal points of precision. If we needed to do some really high precision trig / unit vector stuff, we could switch it around to 8.24 or even 1.31. Fixed point is very flexible in this way - controlling where you choose to lose precision as opposed to floating point which is out of your control. – Michael Dorgan Jun 03 '10 at 17:04
  • I guess I never mentioned but now is as good a time as any... I'm working with OpenGL ES in this case trying to stick to fixed point and doing some manual editing of vertex coordinates. I've wondered, could I just scale down perhaps even though it's expecting 16.16 format? – Maximus Jun 03 '10 at 19:34
  • I have no idea on that. I'm looking at OpenGL ES for my next project, but haven't used it much at all up to this point. That said, if the numbers at their lowest levels represent pixel positions, losing most of your sub-pixel accuracy won't hurt. The real answer is to try shifting down to preserve your top and see how bad it looks - if bad at all. – Michael Dorgan Jun 03 '10 at 20:30