Big-O of division

Question

What is the Big-O of division on most modern day ISAs? Is there some kind of optimization or is it the naive O(numerator/denominator)? I'm writing code that relies heavily of modulus operation.

For example, what is the relative times taken to perform 10/5 and 20/5 and 40/5? Do modern processors from Intel, nVidia, Qualcomm etc. have the same Big-O for division?

NOTE: I could be wrong here by assuming that division is O(size of numerator) and this question may not make any sense at all. Please correct me if that's the case.

For integer division it's typically constant but there is a high latency cost. — Paul R, Jan 18 '13 at 09:26
Is it not also dependent on the algorithm of division used? http://en.wikipedia.org/wiki/Division_%28electronics%29 — sr01853, Jan 18 '13 at 09:28
@mamdouhalramadan only because the input is fixed-size, but that's a cheat. You might as well say "well pointers are fixed size, therefore iterating through a linked list (without loops) is O(1) because there is a constant maximum number of nodes it can address". The number of steps division takes can depend on the value of the inputs as well, depending on the algorithm used. — harold, Jan 18 '13 at 09:41
Just to be clear, are you asking about dividing one fixed-width integer by another? — NPE, Jan 18 '13 at 09:43
@Sibi: If you consider it as a function of bitsize: yes. But as cpus always work on a certain word size, the usually always O(1), regardless which algorithm you use. This is because the runtime of the different algorithms depend on the the size of the data in bits. Just put constant 32- or 64 into the runtime formulas, and you will get constant runtime for the algorithms (but attention: They can differ huge - but that is only relevant for real performance/runtime, not for the Big O!). — flolo, Jan 18 '13 at 09:44
In Intel processors at least, the time division takes depends on both the value of the result and the operand size (ie small results are computed faster, but 64-bit division is slower than 32-bit division even for the same result). — harold, Jan 18 '13 at 09:59
If you are using an unbounded width type, Big-O is the best approach to summarizing the effect of increasing operand size on operations. If you are dealing with a fixed with type, such as 32 or 64 bit built-in types, treat is as O(1) but the constant matters for reasonable problem sizes, and may depend on the type. — Patricia Shanahan, Jan 18 '13 at 11:33

score 2 · Accepted Answer · answered Jan 18 '13 at 09:35

2

This question is not that good. But it is also not that "stupid", so I try to answer /clarify some points:

Almost all modern CPU/GPUs have a division instruction. As it works on the default word size, it doesnt matter how fast it is, in terms of Big-O it is constant, so its always O(1). Even for embedded processors, microcontrollers and similar, that do not have an division instruction this is true, as it is emulated in software and the software emulation is bound in terms of the size of a word, so its always a constant time to perform the operation (that means its also O(1)).

The exception is when speaking about operation performed on non-word sized data. This happens e.g. when talking about BigInt libararies. But in this case ALL operations (addition, multiplication,...) are no longer O(1) but depend on the size of the numbers.

But attention: Big-O does not say something about the real computation time. Its just the asymptotic behaviour neglecting a constant factor. That means, even when you have two algorithms that take O(n) the time difference can be a factor of 1000 (or one million or whatever you want). Best example the division: It an e.g. an addition are both O(1), but usually the division takes far more cycles/time to execute than the addition.

answered Jan 18 '13 at 09:35

flolo

15,148
4
32
57

what about long in java? On a 32-bit machine, it would be double the word size. Does it still have O(1) runtime? – user1210233 Jan 18 '13 at 09:36
2

Even when the machine has no 64-bit div instruction it would be O(1), as it would be always a constant number of operation to perform/emulate the 64-bit div with 32-bit instructions. But again: The O-class is the same, but the difference in real execution time is approx a factor of 3-6 (just a guess, could be higher or lower, depends heavily on the used plattform/system). – flolo Jan 18 '13 at 09:40
You should also mention about amortized complexity. I don't know about the other operations, but addition is amortized O(1). – Paul Manta Jan 19 '13 at 15:06
In what sense can you say it's O(1)? For O notation to make sense you need to be looking at what happens when length of input goes to infinity. So saying that it's a single instruction for word sized input tells you nothing about the complexity of multiplication. Unless your inputs are already bounded. In which case you are not talking about the complexity of multiplication but specifically about the complexity of multiplication of two word sized numbers. – DRF Feb 20 '23 at 17:43

score 0 · Answer 2 · answered Feb 16 '22 at 22:11

Although you could create your own implementation with binary division; https://www.youtube.com/watch?v=TPVFYoxna98

I think most processor optimizations would actually be MUCH faster, per the previous post. You would need to look at the bytecode your creating to actually be sure, but it would probably involve putting stuff in the processors cache, so your stuck with this as the best solution;

int a = ... int b = ...

int quotient = a / b; int remainder = a - (quotient * b);

i.e. a=5, b=2 quotient: 2 remainder: 1

from here (although it has errors :)- ); Java - get the quotient and remainder in the same step?

score 0 · Answer 3 · answered Feb 17 '22 at 00:10

However if your using a base of 2 and you know it you can optimize with this;

public class Foo {
  public static void println(String s) {
    System.out.println(s);
  }
  
  public static void main(String [] args) {
    int size = 100;
    int[] randoms = new int[size];
    for (int i = 0; i < randoms.length; i++) {
      randoms[i] = (int) (Math.random() * 1000);
    }
    
    for (int i = 0; i < randoms.length; i++) {
      int j = randoms[i];
      int k = j >> 3;
      int l = j - (k << 3);
      println("value " + i + " " + j );
      println(" " + j + " / 8 =  " + k + " remainder " + l );
    }
    //println("hey got " + c + " from " + a + " >> " + b);
    
  }
}

You should clearly say that division or modulo by a power of two can be optimized with binary shifts instead of posting a very long example. (BTW Java is not the most low-level language for processor instruction speed) — Sebastian, Feb 22 '22 at 21:07

Big-O of division

3 Answers3