5

This is a problem I've been pondering for quite some time.

What is the fastest way to find all numbers from a to b that are not divisible by any number from x to y?

Consider this:

I want to find all the numbers from 1 to 10 that are not divisible by 2 to 5. This process will become extremely slow if I where to use a linear approach; Like this:

result = []
a = 1
b = 10
x = 2
y = 5
for i in range(a,b):
    t = False
    for j in range(x,y):
        if i%j==0:
            t = True
            break
    if t is False:
        result.append(i)
return result

Does anybody know of any other methods of doing this with less computation time than a linear solution?

If not, can anyone see how this might be done faster, as I am blank at this point...

Sincerely, John

[EDIT]

The range of the number are 0 to >1,e+100

This is true for a, b, x and y

Sumurai8
  • 20,333
  • 11
  • 66
  • 100
JohnWO
  • 203
  • 3
  • 13
  • 3
    Are you optimizing for large (b-a), large b, large (y-x), large y or for calling it many many times with small numbers? I suspect the answer will vary depending on those questions – Patashu Apr 12 '13 at 05:18
  • That is part of the problem: a, b, x, y, grows progressively – JohnWO Apr 12 '13 at 05:21
  • 1
    Didn't you want to write 1e100 instead of "1,e+100"? If this is the case, then it will be hard to find a very fast method, as the set of numbers does not fit in memory, or even a hard drive (by far). If the number count is reasonable (say about 1e8, so that they fit in memory), then a fast approach can be obtained by trading memory for speed. – Eric O. Lebigot Apr 12 '13 at 06:39

2 Answers2

4

You only need to check prime values in the range of the possible divisors - for example, if a value is not divisible by 2, it won't be divisible by any multiple of 2 either; likewise for every other prime and prime multiple. Thus in your example you can check 2, 3, 5 - you don't need to check 4, because anything divisible by 4 must be divisible by 2. Hence, a faster approach would be to compute primes in whatever range you are interested in, and then simply calculate which values they divide.

Another speedup is to add each value in the range you are interested in to a set: when you find that it is divisible by a number in your range, remove it from the set. You then should only be testing numbers that remain in the set - this will stop you testing numbers multiple times.

If we combine these two approaches, we see that we can create a set of all values (so in the example, a set with all values 1 to 10), and simply remove the multiples of each prime in your second range from that set.

Edit: As Patashu pointed out, this won't quite work if the prime that divides a given value is not in the set. To fix this, we can apply a similar algorithm to the above: create a set with values [a, b], for each value in the set, remove all of its multiples. So for the example given below in the comments (with [3, 6]) we'd start with 3 and remove it's multiples in the set - so 6. Hence the remaining values we need to test would be [3, 4, 5] which is what we want in this case.

Edit2: Here's a really hacked up, crappy implementation that hasn't been optimized and has horrible variable names:

def find_non_factors():
    a = 1
    b = 1000000
    x = 200
    y = 1000

    z = [True for p in range(x, y+1)]
    for k, i in enumerate(z):
        if i:
            k += x
            n = 2
            while n * k < y + 1:
                z[(n*k) - x] = False
                n += 1

    k = {p for p in range(a, b+1)}

    for p, v in enumerate(z):
        if v:
            t = p + x
            n = 1
            while n * t < (b + 1):
                if (n * t) in k:
                    k.remove(n * t)
                n += 1

    return k

Try your original implementation with those numbers. It takes > 1 minute on my computer. This implementation takes under 2 seconds.

Yuushi
  • 25,132
  • 7
  • 63
  • 81
  • This is not true, for example 7*11 is not divisible by 2, 3, 4 or 5 but it is not a prime either. – Patashu Apr 12 '13 at 05:22
  • 1
    @Patashu You've misunderstood what I've said (though I agree I haven't worded it that well). I mean, in a range say `[2, 5]` you only need to test `2, 3, 5` - testing for `2` will test for `4` and all other multiples of two. Similarly for testing divisibility in `[2, 10]` you would only need to check `2, 3, 5, 7`. – Yuushi Apr 12 '13 at 05:25
  • so you only need to check if its divisible by primes is what he is saying :P – Joran Beasley Apr 12 '13 at 05:25
  • @Yuushi Ok, I get what kinds of primes you mean now. – Patashu Apr 12 '13 at 05:29
  • 1
    @Yuushi What if you have 3, 4, 5, 6? 4 is not a prime but you need to leave it in because you're not testing 2. – Patashu Apr 12 '13 at 05:30
  • @Yuushi I gave an answer with some thoughts of my own. It's hard to give a best answer for the question not knowing for what use case we are optimizing for – Patashu Apr 12 '13 at 05:35
  • Thanks, but this is also an linear approach, and it is actually slower than the example provided; At least with Python. – JohnWO Apr 12 '13 at 05:41
  • @user2272969 Are you sure it's slower than the example? Imagine x-y being reasonably small (4-10 numbers) and a-b being huge (billions or more). My solution is optimized for that case - you do extra work to make x-y once, then for each entry in a-b you only have to do one mod and one array lookup. I guess it would take a certain weird sweet spot between having x-y be large enough and not too large and having a-b be really huge but... Well, see, this is why I don't like premature optimization XD – Patashu Apr 12 '13 at 05:50
  • @user2272969 Perhaps slower with a very small example like you have. Make the numbers much larger, and it is much faster. – Yuushi Apr 12 '13 at 06:04
  • Great piece of code, but it dosen't behave as expected. Try changing values to: a = 1 b = 10000 x = 2 y = 10000 – JohnWO Apr 12 '13 at 06:23
  • @user2272969 Works fine for me? Gives the same answer as your code. – Yuushi Apr 12 '13 at 07:10
  • How to do this for a single number,only x is provided? – Rohit-Pandey Oct 14 '17 at 01:30
3

Ultimate optimization caveat: Do not pre-maturely optimize. Any time you attempt to optimize code, profile it to ensure it needs optimization, and profile the optimization on the same kind of data you intend it to be optimized for to confirm it is a speedup. Almost all code does not need optimization, just to give the correct answer.

If you are optimizing for small x-y and large a-b:

Create an array with length that is the lowest common multiple out of all the x, x+1, x+2... y. For example, for 2, 3, 4, 5 it would be 60, not 120.

Now populate this array with booleans - false initially for every cell, then for each number in x-y, populate all entries in the array that are multiples of that number with true.

Now for each number in a-b, index into the array modulo arraylength and if it is true, skip else if it is false, return.

You can do this a little quicker by removing from you x to y factors numbers whos prime factor expansions are strict supersets of other numbers' prime factor expansions. By which I mean - if you have 2, 3, 4, 5, 4 is 2*2 a strict superset of 2 so you can remove it and now our array length is only 30. For something like 3, 4, 5, 6 however, 4 is 2*2 and 6 is 3*2 - 6 is a superset of 3 so we remove it, but 4 is not a superset of everything so we keep it in. LCM is 3*2*2*5 = 60. Doing this kind of thing would give some speed up on its own for large a-b, and you might not need to go the array direction if that's all you need.

Also, keep in mind that if you aren't going to use the entire result of the function every single time - like, maybe sometimes you're only interested in the lowest value - write it as a generator rather than as a function. That way you can call it until you have enough numbers and then stop, saving time.

Patashu
  • 21,443
  • 3
  • 45
  • 53
  • Thanks for the reply! This is much faster than the example provided when, as you stated: "optimizing for small x-y and large a-b" Problems arise when the range of x-y becomes large. Just so that no confusion arise: What you identified x-y as, I identified as a-b. – JohnWO Apr 12 '13 at 05:46
  • @user2272969 I should be using the same naming scheme as you. – Patashu Apr 12 '13 at 05:47