2

This is a question, regarding my homework, specifically on NASM.

I am writing an algorithm to find the least whole factor of a number. (Greater than 1)

In pseudo-code it can be summed up as:

if(n%2==0)
    return 2;
for(i=3; i <= n/2; i+=2)
    if(n%i==0)
        return i;
return n;

The program is just slightly slower than the requirement for large numbers. (n > 1 000 000 000)

The most obvious (to me) improvement would be replacing n/2 with sqrt(n). However I am not supposed to know how to use floating-point numbers and finding the integer sqrt by Newton's method seems overkill. (Because I don't actually need to know the exact value and although I haven't tested it, I would imagine finding the isqrt recursively/iteratively would be quite slow)

So I was wondering, whether there is a fast algorithm for some function, such that sqrt(n) < f(n) < n/2. By "fast" I mean preferably constant time and by f(n) < n/2 I mean significantly less for big n.

Some of the options I am considering are:

  • Check, for i <= min(sqrt(2^32), n/2), where sqrt(2^32) = 2^16 is a constant.

  • Replace i <= n/2 with i <= (2^p), where p = ⌈log_2(n)/2⌉ or something. (p is half the position of most significant bit of n)

RuRo
  • 311
  • 4
  • 17

3 Answers3

3

There is an iterative process to find a squareroot:

def approximate_sqrt(number, depth, start_val):
    for i in range(depth):
        start_val=(start_val+number/start_val)/2
    return start_val

The better the initial guess (start_val), the faster it converges to a reasonable solution.

If start_val>sqrt(number) 

then every iterative value>sqrt(number) 

and so it provides an upper bound (similarly for start_val < sqrt(number)). You can reduce the iteration depth to 1 or 2 if your initial guess is pretty close. So for iteratively guessing an upper bound for prime number candidates for example you can call

sqrt_appr=approximate_sqrt(i, 1, sqrt_appr+1) 

for the next prime number candidate with the previous estimation for the square root of sqrt_appr and get upper bounds with an error of about 10E-6. (Although every time I checked how close the approximation was, which was for intervals of 3 million numbers, I set sqrt_appr=sqrt(number)+1 to reset the process.)

M-Chen-3
  • 2,036
  • 5
  • 13
  • 34
2

Replace the i with i*i:

 if (n % 2 == 0)
    return 2;
for (int i = 3; i * i <= n; i += 2)
    if (n % i == 0)
        return i;
return n
  • I think you meant `i*i <= n`. I would certainly try this, when I get home, but wouldn't performing around 50000 extra multiplication operations be slower than the `log_2` option I mentioned, even though it is saving some iteration time. – RuRo Mar 11 '17 at 10:38
  • Okay, I may be doing something wrong, but the algorithm got slower when calculating `i*i` in the loop. It gets around 5 seconds in the worst case scenario (large prime). My old version was around 2 seconds for the same input. The algorithm I chose in the end, gets sub 0.025s. Anyways, thank you for your effort. – RuRo Mar 11 '17 at 17:33
2

In the end I settled on the ⌈log_2(n)/2⌉ version. Since sqrt(n) = 2^(log_2(n)/2). So for anyone in need this is my solution. The sqrt(n) upper bound approximation is O(1). Whole algorithm is O(sqrt(n)) (I think).

mov esi,ebx     ;ebx = N
shr esi,1       ;esi = N/2
cmovnc esi,2    ;if no remainder RETURN 2
jnc RETURN
mov edi,2       ;edi is max counter
bsr eax,ebx     ;eax is most significant set bit of ebx (log_2(eax))
shr eax,1       ;eax/=2
mov cl,al
shl edi,cl      ;max counter is 2^(log_2(N)/2 + 1) >= sqrt(N)
mov esi,1       ;default answer is 1
mov ecx,3       ;start counter is 3
START:
mov edx,0
mov eax,ebx
div ecx         ;divide N by counter
test edx,edx    ;if no remainder RETURN counter
cmovz esi,ecx
jz RETURN
add ecx,2       ;else counter += 2
cmp ecx,edi
jb START        ;if counter <= max counter try agian with new divisor
RETURN:
                ;the answer is in (esi)

P.S. If you are wondering why I don't just check i*i <= N. It is actually significantly slower than this version. Just adding a mul inside the loop shouldn't slow it down that much, so I suspect it breaks the branch prediction algorithm each iteration. The cmp ecx,edi is comparing a counter against a constant, so it might be predicted right most of the times, where the cmp 'ecx*ecx',ebx would be comparing a square of the counter, which may not be so obvious to the processor.

RuRo
  • 311
  • 4
  • 17