2

I am working within constraints of hardware that has 64bit integer limit. Does not support floating point. I am dealing with very large integers that I need to multiply and divide. When multiplying I encounter an overflow of the 64bits. I am prototyping a solution in python. This is what I have in my function:

upper = x >> 32 #x is cast as int64 before being passed to this function
lower = x & 0x00000000FFFFFFFF

temp_upper = upper * y // z #Dividing first is not an option, as this is not the actual equation I am working with. This is just to make sure in my testing I overflow unless I do the splitting.
temp_lower = lower * y // z

return temp_upper << 32 | lower

This works, somewhat, but I end up losing a lot of precision (my result is off by sometimes a few million). From looking at it, it appears that this is happening because of the division. If sufficient enough it shifts the upper to the right. Then when I shift it back into place I have a gap of zeroes.

Unfortunately this topic is very hard to google, since anything with upper/lower brings up results about rounding up/down. And anything about splitting ints returns results about splitting them into a char array. Anything about int arithmetic bring up basic algebra with integer math. Maybe I am just not good at googling. But can you guys give me some pointers on how to do this?

Splitting like this is just a thing I am trying, it doesnt have to be the solution. All I need to be able to do is to temporarily go over 64bit integer limit. The final result will be under 64bit (After the division part). I remember learning in college about splitting it up like this and then doing the math and re-combining. But unfortunately as I said I am having trouble finding anything online on how to do the actual math on it.

Lastly, my numbers are sometimes small. So I cant chop off the right bits. I need the results to basically be equivalent to if I used something like int128 or something.

I suppose a different way to look at this problem is this. Since I have no problem with splitting the int64, we can forget about that part. So then we can pretend that two int64's are being fed to me, one is upper and one is lower. I cant combine them, because they wont fit into a single int64. So I need to divide them first by Z. Combining step is easy. How do I do the division?

Thanks.

Duxa
  • 966
  • 2
  • 14
  • 27
  • I'm not really following, maybe I'm dumb, but if If you have Python, you have arbitrary sized ints, there is not 64bit limit. So you can just do the calculation to get your result and pack it into any binary struct you want (what exactly are you interfacing with?) – juanpa.arrivillaga Aug 07 '20 at 08:25
  • 1
    It's still unclear to me what you want to calculate. Is it (x * y)/z for a 128 bit x where the machine can do only 64 bit math? How many bits do y and z have? Are the variables signed or unsigned? – Falk Hüffner Aug 07 '20 at 09:05
  • @juanpa.arrivillaga python seems to be used only for prototyping, OP cannot use large integer prebuilt arithmetic. It could as well have been language agnostic. – aka.nice Aug 07 '20 at 12:28
  • Hey guys, to clarify. python is only for prototype, the code is running on hardware as machine code, 64bit is size of the registers. I need to be able to do math on larger numbers than 64bit. One way of doing this is (As I see it), splitting into upper and lower, then recombining after doing math. – Duxa Aug 07 '20 at 17:23

2 Answers2

0

As I understand it, you want to perform (x*y)//z.

Your numbers x,y,z all fit on 64bits, except that you need 128 bits for intermediate x*y.

The problem you have is indeed related to division: you have

h * y = qh * z + rh
l * y = ql * z + rl

h * y << 32 + l*y = (qh<<32 + ql) * z + (rh<<32 + rl)

but nothing says that (rh<<32 + rl) < z, and in your case high bits of l*y overlap low bits of h * y, so you get the wrong quotient, off by potentially many units.

What you should do as second operation is rather:

rh<<32 + l * y = ql' * z + rl'

Then get the total quotient qh<<32 + ql'

But of course, you must care to avoid overflow when evaluating left operand...

Since you are splitting only one of the operands of x*y, I'll assume that the intermediate result always fits on 96 bits.

If that is correct, then your problem is to divide a 3 32bits limbs x*y by a 2 32bits limbs z.
It is thus like Burnigel - Ziegler divide and conquer algorithm for division.
The algorithm can be decomposed like this:

  • obtain the 3 limbs a2,a1,a0 of multiplication x*y by using karatsuba for example
  • split z into 2 limbs z1,z0
  • perform the div32( (a2,a1,a0) , (z1,z0) )

here is some pseudo code, only dealing with positive operands, and with no guaranty to be correct, but you get an idea of implementation:

p = 1<<32;

function (a1,a0) = split(a)
    a1 = a >> 32;
    a0 = a - (a1 * p);

function (a2,a1,a0) = mul22(x,y)
    (x1,x0) = split(x) ;
    (y1,y0) = split(y) ;
    (h1,h0) = split(x1 * y1);
    assert(h1 == 0); -- assume that results fits on 96 bits
    (l1,l0) = split(x0 * y0);
    (m1,m0) = split((x1 - x0) * (y0 - y1));  -- karatsuba trick
    a0 = l0;
    (carry,a1) = split( l1 + l0 + h0 + m0 );
    a2 = l1 + m1 + h0 + carry;
    
function (q,r) = quorem(a,b)
    q = a // b;
    r = a - (b * q);

function (q1,q0,r0) = div21(a1,a0,b0)
   (q1,r1) = quorem(a1,b0);
   (q0,r0) = quorem( r1 * p + a0 , b0 );
   (q1,q0) = split( q1 * p + q0 );
   
function q = div32(a2,a1,a0,b1,b0)
    (q,r) = quorem(a2*p+a1,b1*p+b0);
    q = q * p;
    (a2,a1)=split(r);
    if a2<b1
        (q1,q0,r)=div21(a2,a1,b1);
        assert(q1==0); -- since a2<b1...
    else
        q0=p-1;
        r=(a2-b1)*p+a1+b1;
    (d1,d0) = split(q0*b0);
    r = (r-d1)*p + a0 - d0;
    while(r < 0)
        q = q - 1;
        r = r + b1*p + b0;
   
function t=muldiv(x,y,z)
    (a2,a1,a0) = mul22(x,y);
    (z1,z0) = split(z);
    if z1 == 0
        (q2,q1,r1)=div21(a2,a1,z0);
        assert(q2==0); -- otherwise result will not fit on 64 bits
        t = q1*p + ( ( r1*p + a0 )//z0);
    else
        t = div32(a2,a1,a0,z1,z0);
aka.nice
  • 9,100
  • 1
  • 28
  • 40
0

using struct and split it.

import struct
num64 = 13461113992590154878
struct.unpack("ii", num64.to_bytes(8, "little", signed=False))