Which is more efficient to use in a for loop, i

Question

Can anyone help me out which of the following two is more efficient and correct?

1.

for(int i = num; i * i <= n; i++)

2.

int root = sqrt(n);
for(int i = num; i <= root; i++)

In the first approach we are calculating square of i each time loop runs. Also we can't pre-compute square of i as i gets updated each time.

In the second approach we are not calculating sqrt(n) each time. Is this saving time?

Will the 2nd loop work faster with 100% accurate result even for large numbers (like 10^6)?

multiplications are cheap. But this really needs benchmarking, and as you noted, depends on `n` — Jean-François Fabre, Mar 26 '20 at 06:57
`*` is faster than `sqrt`. Both are imprecise for some numbers. — Amadan, Mar 26 '20 at 06:58
`*` is imprecise for _integers_ ? this kind of loop is often used for prime number search — Jean-François Fabre, Mar 26 '20 at 06:58
@Jean-FrançoisFabre: Wrong choice of words. Integers can't deal with big numbers. Floats lose precision with numbers with lots of digits. Bigints are slow, and in C require a library. Pick your poison. — Amadan, Mar 26 '20 at 06:59
yeah, I'm more used to this with python. With C it's a big annoyance to use integers. But you can't do anything else (`uint64_t` helps a bit) for prime number search. At some point you'll have to use some big integer lib. And yes it's slow. Why do you think the biggest know prime is only 2**82,589,933-1 (also it's Mersenne prime) ? :) — Jean-François Fabre, Mar 26 '20 at 07:01
anyway, if the range is 10^6 it's much better to use a sieve. All prime numbers computed in record time. — Jean-François Fabre, Mar 26 '20 at 07:03
I would be surprised if you could reliably measure the difference. Especially if your loop is doing more than the most trivial of operations. You could also convert the result of your sqrt call into an int to remove an implicit type conversion. — matt, Mar 26 '20 at 07:04
This is probably heavily dependent on the processor and the compiler/library. The time *complexity* is obviously worse if you perform a computation -- *any* computation -- in the loop (linear) instead of *any* computation outside (constant). — Peter - Reinstate Monica, Mar 26 '20 at 07:11
Is `n` really constant? If you're factoring `n`, then you'll probably achieve faster running time if you divide `n` by each found factor. In that case, the `sqrt` is only valid until the first factor is found. — rici, Mar 26 '20 at 08:32
@Ravi Keshri - What do you mean with _large numbers (like 10^6)_; `n`, or `root`? — Armali, Mar 26 '20 at 08:59
If `n` is invariant in the loop, many compilers will recognize that `sqrt(n)` is invariant, and `i < sqrt(n)` (shown in the title although not the body) would be at least as fast as `i*i < n`. — Eric Postpischil, Mar 26 '20 at 11:35
Re “Will the 2nd loop work … with 100% accurate result even for large numbers (like 10^6)?”: `sqrt(n)` converts `n` to `double`, which is required by the C standard to have enough precision to distinguish ten-decimal-digit numbers, and a decent `sqrt` implementation will return results such that `(int) sqrt(n)` suffices as a loop boundary. However, a poor `sqrt` might return, say, a value slightly under 7919 for `sqrt(7919*7919)`, so the loop could fail. — Eric Postpischil, Mar 26 '20 at 11:40
You could run the loop with `for (is = i*i, i = num ; is <= n ; is += 2*i + 1, i++) {...}` if you are worried about the speed of `i*i` (assuming the compiler does the sensible thing with `2*i`) and you cannot face the cost (and uncertainty) of `(int)sqrt(n)`. — Chris Hall, Mar 26 '20 at 13:54

score 1 · Accepted Answer · answered Mar 26 '20 at 08:08

It makes sense that the second scenario should be more effective as the loop limiter need not be calculated for every iteration in the loop. Tried to measure the CPU time used in going thru the 2 types of for loops with the code given below. Looks like the cpu time is actually lesser for the first scenario for smaller numbers like n=25. but then for values n>=100 the second scenario gives a smaller cpu time.

    clock_t start,end;
    double cpu_time_used;
    double n, root;

    n = atoi(argv[1]);
    printf("n= %0f \n",n);
    start = clock();
    for (int i=0; i*i<n; i++);
    end = clock();
    cpu_time_used = ((double) (end-start)) / CLOCKS_PER_SEC;
    printf("first iter: cpu_time_used: %f \n", cpu_time_used);

    start = clock();
    root = sqrt(n);
    for (int i=0; i<=root; i++);
    end = clock();
    cpu_time_used = ((double) (end-start)) / CLOCKS_PER_SEC;
    printf("second iter: cpu_time_used: %f \n", cpu_time_used);

Outputs:

n= 25.000000 
first iter: cpu_time_used: 0.000004 
second iter: cpu_time_used: 0.000011 

n= 100.000000 
first iter: cpu_time_used: 0.000002 
second iter: cpu_time_used: 0.000001 

n= 1000000.000000 
first iter: cpu_time_used: 0.000011 
second iter: cpu_time_used: 0.000008

Do you have any issue with the fact that n=100 went faster than n=25? — matt, Mar 26 '20 at 08:41
Since the `for` loops do absolutely nothing the compiler may have discarded them. So you may not be measuring what you think you are measuring. (A little gentle disassembly will tell you what the compiler has produced.) On my machine `CLOCKS_PER_SEC` is 1 million, but it can easily rip up several thousand instructions in a micro-second, though perhaps only a thousand multiplies -- so trying to measure the run-time of 25 or 100 multiplies using `clock()` would, on my machine, be a little tricky. — Chris Hall, Mar 26 '20 at 12:22

Which is more efficient to use in a for loop, i

1 Answers1