1

I am trying to implement multi-precision arithmetic for 256-bit operands based on radix-2^32 representation. In order to do that I defined operands as:

typedef union UN_256fe{

uint32_t uint32[8];

}UN_256fe;

and here is my MP addition function:

void add256(UN_256fe* A, UN_256fe* B, UN_256fe* result){

uint64_t t0, t1;
t0 = (uint64_t) A->uint32[7] + B->uint32[7];
result->uint32[7] = (uint32_t)t0;
t1 = (uint64_t) A->uint32[6] + B->uint32[6] + (t0 >> 32);
result->uint32[6] = (uint32_t)t1;
t0 = (uint64_t) A->uint32[5] + B->uint32[5] + (t1 >> 32);
result->uint32[5] = (uint32_t)t0;
t1 = (uint64_t) A->uint32[4] + B->uint32[4] + (t0 >> 32);
result->uint32[4] = (uint32_t)t1;
t0 = (uint64_t) A->uint32[3] + B->uint32[3] + (t1 >> 32);
result->uint32[3] = (uint32_t)t0;
t1 = (uint64_t) A->uint32[2] + B->uint32[2] + (t0 >> 32);
result->uint32[2] = (uint32_t)t1;
t0 = (uint64_t) A->uint32[1] + B->uint32[1] + (t1 >> 32);
result->uint32[1] = (uint32_t)t0;
t1 = (uint64_t) A->uint32[0] + B->uint32[0] + (t0 >> 32);
result->uint32[0] = (uint32_t)t1;

}

I implemented it without using loop for simplicity. Now when I test my function inside main:

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>
#include "mmulv3.0.h"

int main(){

UN_256fe result;
uint32_t c;

UN_256fe a = {0x00000f00,0xff00ff00,0xffff0000,0xf0f0f0f0,0x00000000,0xffffffff,0xf0fff000,0xfff0fff0};
UN_256fe b = {0x0000f000,0xff00ff00,0xffff0000,0xf0f0f0f0,0x00000000,0xffffffff,0xf0fff000,0xfff0ffff};
c = 2147483577;
printf("a:\n");
for(int i = 0; i < 8; i +=1){
    printf("%"PRIu32, a.uint32[i]);
}
printf("\nb:\n");
for(int i = 0; i < 8; i +=1){
    printf("%"PRIu32, b.uint32[i]);
}

add256(&a, &b, &result);
printf("\nResult for add256(a,b) = a + b:\n");
for(int i = 0; i < 8; i +=1){
    printf("%"PRIu32, result.uint32[i]);
}

return 0;
}

I've got:

a:
38404278255360429490176040423221600429496729540433049604293984240
b:
614404278255360429490176040423221600429496729540433049604293984255
Result for add256(a,b) = a + b:
652814261543425429483622537896770241429496729537916426254293001199

However, when I verified my result with sage, I've got:

sage: a=38404278255360429490176040423221600429496729540433049604293984240
sage: b=614404278255360429490176040423221600429496729540433049604293984255
sage: a+b
652808556510720858980352080846443200858993459080866099208587968495

Would you please help me out here?

A23149577
  • 2,045
  • 2
  • 40
  • 74
  • 4
    You seem to be concatenating the decimal representations of the individual words while printing. That won't work with a 32-bit binary "digits". Either perform base conversion by means of repeated short division by ten to extract the decimal digits or use decimal words modulo 10^9 for the representation. – doynax Dec 01 '15 at 20:51
  • Why use a `union` instead of a `struct`? – chqrlie Dec 01 '15 at 20:55
  • What does it matter whether a `union` or a `struct` is used when there is only one member? – John Bollinger Dec 01 '15 at 21:01
  • @chqrlie you are right I will change it to struct, I had more members in that type in previous versions and forgot to change it in my code – A23149577 Dec 01 '15 at 21:04
  • @A23149577 re your [recently deleted question](http://stackoverflow.com/questions/34161854/multi-precision-multiplication-in-cuda) it is really frustrating to prepare an answer but find you deleted the question ;( – Weather Vane Dec 08 '15 at 18:42
  • @WeatherVane I am really sorry! people kept voting down on that question because it seems nobody has any idea about multi-precision arithmetic in cuda and they think it should be closed. I repost the question and thank you so much for you answer in advance. – A23149577 Dec 08 '15 at 20:03

3 Answers3

3

The algorithm for addition seems correct, but you cannot print these 256-bit integers in decimal by converting each component individually.

Just think of this simple example: 0x100000000, stored as { 0,0,0,0,0,0,1,0 }, will print as 10 instead of 4294967296. Base-10 conversion is significantly more complex than the simple addition.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
2

Would have helped to write a space character between the numbers.

But how much is 384 + 6144 + 1? I think you are adding from the wrong end.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • I don't think so, the OP is using a big-endian storage order for the `uint32_t` elements, themselves most likely stored in little-endian order. – chqrlie Dec 01 '15 at 20:57
  • The first word of his result is what you expect if you add right to left (384 + 6144 + 1); the first word of the number that he compares against is what you expect if you add left to right (384 + 6144). Obviously not printing spaces between the numbers is daft, and hexadecimal would have been a lot more useful. – gnasher729 Dec 01 '15 at 21:00
  • You are confused: the first words of `A` and `B` are `3840` and `61440`. `3840 + 61440 + 1` indeed equals `65281` as expected. – chqrlie Dec 01 '15 at 21:30
1

To print a multi-precision number in decimal takes a bit of code as many of the digits depend on the entire uint32[8].

To print it out in hexadecimal is much easier.

fputs("0x", stdout);
for (int i = 0; i < 8; i +=1){
    printf("%08" PRIX32, a.uint32[i]);
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256