CPUs not like humans? 0 + 0 not easier than 10E12 + 9E15?

Question

My kid asked me a funny question yesterday:

Dad, does a computer have trouble adding / multiplying large numbers like I do? Does it take longer?

I laughed and answered of course not, computers are equally fast with any numbers, they are that smart.

Later, I started thinking about it and asked myself...am I truly right? I tested a few scenarios with doubles and integers and yes, the magnitude of the number doesn't seem to have any impact on the time it takes the CPU to make an operation (yeah, I'm bored).

The highly complicated test implementation was the following:

static void Main(string[] args)
{
   Test(1, 0); //JIT test method

   var elapsed = Test(int.MaxValue, 0);
   Console.WriteLine("Testing with 0: {0} ms", elapsed);

   elapsed = Test(int.MaxValue, 1);
   Console.WriteLine("Testing with 1: {0} ms", elapsed);

   elapsed = Test(int.MaxValue, 1000000);
   Console.WriteLine("Testing with 10E6: {0} ms", elapsed);

   elapsed = Test(int.MaxValue, long.MaxValue / 2);
   Console.WriteLine("Testing with MaxValue/2: {0} ms", elapsed);

   Console.ReadKey();
}

private static long Test(int repetitions, long testedValue)
{
   var stopwatch = new Stopwatch();
   stopwatch.Start();

   for (int i=0; i<repetitions; ++i)
   {
       var dummy = testedValue + testedValue;
   }

   stopwatch.Stop();
   return stopwatch.ElapsedMilliseconds;
}

Still, the question keeps nagging at my head. I am no real expert on how arithmetics is performed exactly in a modern CPU so I am kind of interested in knowing why there is no difference.

You are testing additions... I don't think they are very interesting... You can imagine that a `long` in a PC is like a fixed-length number, prepended with 0... so `0000000000000010`... Now... being fixed length, adding `0000000000000010` + `0000000000000010` has the same complexity as adding `123400000000010` + `567800000000010`... Multiplications could be more interesting, but they surely aren't... But perhaps there are optimizations for the -1, 0, 1 cases for multiplication... — xanatos, Jul 06 '15 at 10:50
The math may be the same time, but you have to parse the strings which may not always be exactly the same time. Floating Point arithmetic may take different amount of time (very slightly different) due to the pipe-lining in the cpu. — jdweng, Jul 06 '15 at 10:58

score 4 · Answer 1 · edited May 23 '17 at 12:04

does a computer have trouble adding / multiplying

Not the best examples, these operations can be performed by logical circuits in the processor. Dedicated hardware, it only takes a single cpu cycle, as fast as it can possible go. There are however some pessimistic cases, they occur when the operands are not "regular". Which happens when a number is denormal, too small to be stored as a regular floating point number or when its NaN, not-a-number.

Listed as an "FP Assist" in the Intel manuals, the processor logic steps in and doesn't leave it to the hardware adder/multiplier anymore but executes micro-code. Which is pretty similar to what it sounds like, a little program that's embedded in the processor itself. Very expensive, easily takes a hundred cycles. You'll find a good example in this post.

More intuitive to your child is division, it is just as plodding in hardware as long division is on paper and requires an iterative approach. Usually takes between 10 and 24 cycles, how long it will be depends on the operand values. The exact dependency between values and number of cycles is foggy, processor vendors treat these kind of implementation details as trade secrets. You probably have to try a bunch of random numbers to see the effect.

score 2 · Answer 2 · answered Jul 06 '15 at 10:56

2

In general, hardware has support for a fixed precision. For example, a 32-bit processor will support typically support integer 32 by 32 multiply (at least to get the low result) and 32-bit addition. In terms of latency (i.e., time from the start of the computation to the time of finishing the computation), most processors have the same latency for double precision floating point as for single precision but with SIMD operations can perform twice as many single precision operations in parallel.

On a 32-bit processor, a 64-bit integer addition will typically have at least twice the latency and half the throughput. (Some instruction sets do not have add with carry, so multiple precision addition is synthesized with an addition, a check for less than (if the sum is less than either source operand then a carry occurred), and a separate addition of the carry. This involves more overhead.)

answered Jul 06 '15 at 10:56

One could also consider energy consumption. A double-precision floating-point operation will consume more energy than a single-precision operation *in the operation itself*, though almost all energy consumed is in other parts of the processing such as instruction fetch, decode, and (particularly, for out-of-order processors) schedule and some energy is used in the execution hardware even when transistors are not switching (i.e., no work is being done). In this sense, a double precision floating-point operation is "harder" for the processor. – Jul 06 '15 at 11:17
Strangely, recent Intel CPU designs are *faster* at 64x64->128b multiply (1 operand form of `imul` / `mul`), than at 32x32->64b or 16x16. Only 8x8b `imul` (or 32x32 -> low32b `imul`) is faster. This is according to Agner Fog's uops / latency / throughput tables. I assume it's not the multiplier itself, but rather bookkeeping / sending the data where it needs to be that's the slow part. (Splitting the 64bit result into eax and edx is slow, compared to the special-purpose 128b multiply output already producing 2 outputs, maybe? Fast 8x8->16 makes sense, as the result goes in ax unsplit.) – Peter Cordes Jul 15 '15 at 13:12
Intel Haswell / Sandybridge timings are: 8x8->16b (`imul r8`): 1 uop. lat=3, tput=1. `imul r64`: 2 uops, lat=3, tput=1. (ports 1 and 6). `imul r32`: 3 uops, lat=4, tput=1/2. `imul r16`: 4 uops, lat=4, tput=1/2. – Peter Cordes Jul 15 '15 at 13:17

score 2 · Answer 3 · answered Jul 06 '15 at 11:09

Dad, does a computer have trouble adding / multiplying large numbers like I do? Does it take longer?

No, of course not.

An addition of two 32-bit integers always takes the exact same time, regardless of the content. In fact, it is a total mechanical operation. The CPU has a clock, and at every clock cycle it does the same things -- load command, execute command, etc. --- totally stubborn and stupid ;)

Pretty much the same like an old factory belt which is running at the same constant speed, and every single worker just always doing the same, like turning in 4 screws ... (and, other than humans, the machine never gets tired).

CPUs not like humans? 0 + 0 not easier than 10E12 + 9E15?

3 Answers3