Implementation of unlimited precision integers

Question

Problem: "A big integer is represented as a list of (small) integers."

Suppose to have:

type reg = string;;   (* "$0" models register set to constant 0 *)
type label = string;; (* empty string models no label *)

type asmistr =
   AsmHalt
 | AsmNop
 | AsmAdd of reg * reg * reg
 | AsmAddi of reg * int * reg
 | AsmSub of reg * reg * reg
 | AsmMul of reg * reg * reg
 | AsmLoad of reg * reg * reg
 | AsmStore of reg * reg * reg
 | AsmJmp of label
 | AsmBne of reg * reg * label
 | AsmBeq of reg * reg * label
 | AsmSlt of reg * reg * reg
;;

type asmprog = AsmProg of (label * asmistr) list;;

type asmline = 
    AsmIstr of label * asmistr 
  | AsmComment of string 
  | AsmDebugReg of reg 
  | AsmDebugMem of int * int
;;

This sets of definitions are used for define a language like assembly, using registers, instructions and labels (used on jumps)

Now I need to implement a compiler from a imperative language (that has instructions like "while" "if") to ASM

The implementation suggested by my teacher is to use a list where each element is the digit of the given number (the number can be only integer) like 11000 is [1, 1, 0, 0, 0]

The first gap is: how can I implement this considering a generic O'Caml program? Suppose that I have to insert a big integer, what logic I can use to permit "calculations"? Because at the end, an ASM program can also do add, sub mul and other instruction that can include big integers, so I don't know how to deal this with registers, big integers and instructions

What I need is a general scheme of how to implement big integers, possibly in O'Caml language, and how to realize this considering a language similar to assembly (in this case, ASM)

thanks in advance, if is not clear, sorry for my english and if someone can help me, I will put more details if needed

score 1 · Answer 1 · answered Feb 06 '13 at 23:18

My understanding of your question is the following: you want to compile a simple imperative languages that has unbounded integers to assembly, and the compiler will be written in OCaml. Is that right? Your question is "how should I compile the arithmetic operations on unbounded integers?".

If that is indeed the question, a good exercise would be to implement those big number operations in OCaml first (using lists of int; remark that each element doesn't need to be 0 or 1, you can use any larger base whose addition won't overflow your native OCaml integers, and that will make operations quicker), and then wonder how to port that to a native assembly program. How would you compile lists, for a start?

I just realized that I can use pairs for create a infinite concatenation on integers, I can use that for the ASM translation, like a list of 2 elements where the second element is a pointer to the same element type I think I have the idea now, very thanks! For the ASM part I'll use registers for store the current number, memory address for see the next element — genesisxyz, Feb 07 '13 at 22:25

score 0 · Answer 2 · answered Feb 07 '13 at 03:51

First you should understand that numbers have a base. Normally people use decimal (base 10), but programmers use hexadecimal (base 16) and binary (base 2) too. Your teacher is right (use a list where each element is the digit of the given number); but may not have mentioned that these numbers can be in any base. For performance/efficiency you should probably use base 256 (where each digit is an 8-bit integer), or possibly base 65536 (where each digit is an 16-bit integer), base 2^32 or maybe base 2^64. The choice depends on what the underlying hardware is expected to be able to handle (e.g. if the code is intended to run on a 16-bit CPU then you'd use base 65536).

The next thing to decide is how a number is stored. Typically you'd want something to keep track of how many digits there are, some flags, and the list of digits. For the flags you'd want one flag to indicate if the number is positive or negative, but you might use more for things like if the number is infinity, if the number had precision loss, etc. For example, "5/(-6)" might result in a number where "number of digits" is zero; with the negative and precision flags set.

Once this is decided you'd want to implement addition of positive numbers. This is mostly just adding digits with carry, where the result may have (at most) one more digit than the largest source number. For example, if you add a 2 digit number plus a 4 digit number then you'd allocate memory for a 5 digit result, then add digits one at a time using something like:

for(n = 0; n < result_digits; n++) {
    digit = source1[n] + source2[n] + carry;
        if(digit > DIGIT_MAX) {
            carry = 1;
            digit &= DIGIT_MASK;
        } else {
            carry = 0;
        }
        result[n] = digit;
    }
}

The next step is to do code to handle subtraction of a smaller positive number from a larger positive number. This is just subtracting one digit from another with carry, starting from the most significant digit. Once this is working you'd extend it to handle subtraction of a larger positive number from a smaller positive number, which involves swapping the numbers beforehand and negating the result afterwards. For example, "3 - 10 = -( 10 - 3)".

Once the addition and subtraction of positive numbers is working you can start working on support for negative numbers. This is mostly just fiddling with the sign flags and choosing whether to add or subtract. For example, for "8 + (-3) = 8 - 3", "8 - (-3) = 8 + 3", "-8 + 3 = 3 - 8", "-8 + -3 = -(8 + 3)", etc. In all cases it can be rearranged and done as addition or subtraction of positive numbers.

The next step is multiplication of positive numbers. This is just multiplying each digit and adding the overflow from the previous digit:

for(n = 0; n < result_digits; n++) {
    digit = source1[n] * source2[n] + temp;
    temp = digit >> DIGIT_BITS;
    digit &= DIGIT_MASK;
    result[n] = digit;
}

Then you'd worry about multiplication of negative numbers. Here you just make the source numbers positive and do multiplication of positive numbers, then set the sign of the result afterwards.

The number itself **has no base**! Only the text representation of the number has. It is common mistake, reiterated all the time. — johnfound, Feb 07 '13 at 09:22
Thanks for the code, I'll try to implement this logic in O'Caml for the calculations between integers — genesisxyz, Feb 07 '13 at 22:29

Implementation of unlimited precision integers

2 Answers2