16

Since I'm starting to get the hang of Python, I'm starting to test my newly acquired Python skills on some problems on projecteuler.net.

Anyways, at some point, I ended up making a function for getting a list of all primes up until a number 'n'.

Here's how the function looks atm:

def primes(n):
    """Returns list of all the primes up until the number n."""

    # Gather all potential primes in a list.
    primes = range(2, n + 1)
    # The first potential prime in the list should be two.
    assert primes[0] == 2
    # The last potential prime in the list should be n.
    assert primes[-1] == n

    # 'p' will be the index of the current confirmed prime.
    p = 0
    # As long as 'p' is within the bounds of the list:
    while p < len(primes):
        # Set the candidate index 'c' to start right after 'p'.
        c = p + 1
        # As long as 'c' is within the bounds of the list:
        while c < len(primes):
            # Check if the candidate is divisible by the prime.
            if(primes[c] % primes[p] == 0):
                # If it is, it isn't a prime, and should be removed.
                primes.pop(c)
            # Move on to the next candidate and redo the process.
            c = c + 1
        # The next integer in the list should now be a prime, 
        # since it is not divisible by any of the primes before it. 
        # Thus we can move on to the next prime and redo the process.
        p = p + 1       
    # The list should now only contain primes, and can thus be returned.
    return primes

It seems to work fine, although one there's one thing that bothers me. While commenting the code, this piece suddenly seemed off:

# Check if the candidate is divisible by the prime.
if(primes[c] % primes[p] == 0):
    # If it is, it isn't a prime, and should be removed from the list.
    primes.pop(c)
# Move on to the next candidate and redo the process.
c += 1

If the candidate IS NOT divisible by the prime we examine the next candidate located at 'c + 1'. No problem with that.

However, if the candidate IS divisible by the prime, we first pop it and then examine the next candidate located at 'c + 1'. It struck me that the next candidate, after popping, is not located at 'c + 1', but 'c', since after popping at 'c', the next candidate "falls" into that index.

I then thought that the block should look like the following:

# If the candidate is divisible by the prime:
if(primes[c] % primes[p] == 0):
    # If it is, it isn't a prime, and should be removed from the list.
    primes.pop(c)
# If not:
else:
    # Move on to the next candidate.
    c += 1

This above block seems more correct to me, but leaves me wondering why the original piece apparently worked just fine.

So, here are my questions:

After popping a candidate which turned out not be a prime, can we assume, as it is in my original code, that the next candidate is NOT divisible by that same prime?

If so, why is that?

Would the suggested "safe" code just do unnecessary checks on the candidates which where skipped in the "unsafe" code?

PS:

I've tried writing the above assumption as an assertion into the 'unsafe' function, and test it with n = 100000. No problems occurred. Here's the modified block:

# If the candidate is divisible by the prime:
if(primes[c] % primes[p] == 0):
    # If it is, it isn't a prime, and should be removed.
    primes.pop(c)
    # If c is still within the bounds of the list:
    if c < len(primes):
        # We assume that the new candidate at 'c' is not divisible by the prime.
        assert primes[c] % primes[p] != 0
# Move on to the next candidate and redo the process.
c = c + 1
phaz
  • 872
  • 9
  • 23
  • 7
    I think the conclusion you want to make could be restated, something like: if x1 is divisble by n, and x2 is the next integer greater than x1 such that it has no divisors less than n, then x2 is not divisible by n. Is that right? I don't know if it always holds true, but it's an interesting conclusion. – femtoRgon Dec 06 '12 at 16:59
  • 4
    To be pedantic, in the absence of a rigorous proof, we don't know that the function actually works. It could well be failing to detect some large composite number. I've tested it to about 100000; this is suggestive but is not proof. – NPE Dec 06 '12 at 17:31
  • 4
    I wouldn't call that pedantry. I would say that is the meat of this question. – femtoRgon Dec 06 '12 at 17:45
  • @femtoRgon Correct, that is probably the best way to describe what I'm trying to conclude. – phaz Dec 06 '12 at 19:11
  • For efficiency better use a bit array. It's also simpler, and in particular this pitfall doesn't exist. – starblue Dec 06 '12 at 20:17
  • 3
    It seems that no simple math argument can answer the question. At this point I'd recommend posting it to http://mathoverflow.net (it's really not a programming question). For example, http://mathoverflow.net/questions/49400/question-in-prime-numbers might be related, but doesn't allow us to conclude, AFAICT. – Armin Rigo Dec 06 '12 at 22:38

7 Answers7

11

It fails for much bigger numbers. The first prime is 71, for that the candidate can fail. The smallest failing candidate for 71 is 10986448536829734695346889 which overshadows the number 10986448536829734695346889 + 142.

def primes(n, skip_range=None):
    """Modified "primes" with the original assertion from P.S. of the question.
    with skipping of an unimportant huge range.
    >>> primes(71)
    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]
    >>> # The smallest failing number for the first failing prime 71:
    >>> big_n = 10986448536829734695346889
    >>> primes(big_n + 2 * 71, (72, big_n))
    Traceback (most recent call last):
    AssertionError
    """
    if not skip_range:
        primes = list(range(2, n + 1))
    else:
        primes = list(range(2, skip_range[0]))
        primes.extend(range(skip_range[1], n + 1))
    p = 0
    while p < len(primes):
        c = p + 1
        while c < len(primes):
            if(primes[c] % primes[p] == 0):
                primes.pop(c)
                if c < len(primes):
                    assert primes[c] % primes[p] != 0
            c = c + 1
        p = p + 1
    return primes

# Verify that it can fail.
aprime = 71   # the first problematic prime 
FIRST_BAD_NUMBERS = (
        10986448536829734695346889, 11078434793489708690791399,
        12367063025234804812185529, 20329913969650068499781719,
        30697401499184410328653969, 35961932865481861481238649,
        40008133490686471804514089, 41414505712084173826517629,
        49440212368558553144898949, 52201441345368693378576229)

for bad_number in FIRST_BAD_NUMBERS:
    try:
        primes(bad_number + 2 * aprime, (aprime + 1, bad_number))
        raise Exception('The number {} should fail'.format(bad_number))
    except AssertionError:
        print('{} OK. It fails as is expected'.format(bad_number))

I solved these numbers by a complicated algorithm like a puzzle by searching possible remainders of n modulo small primes. The last simple step was to get the complete n (by chinese remainder theorem in three lines of Python code). I know all 120 basic solutions smaller than primorial(71) = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29 * 31 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 repeated periodically by all multiples of this number. I rewrote the algorithm many times for every decade of tested primes because for every decade was the solution much slower than for the previous. Maybe I find a smaller solution with the same algorithm for primes 73 or 79 in acceptable time.


Edit:

I would like to find also a complete silent fail of the unsafe original function. Maybe exists some candidate composed from different primes. This way of solution would only postpone the final outcome for later. Every step would be much more and more expensive for time and resources. Therefore only numbers composed from one or two primes are attractive.

I expect that only two solutions the hidden candidate c are good: c = p ** n or c = p1 * p ** n or c = p1 ** n1 * p ** n where p and p1 are primes and n is a power greater than 1. The primes function fails if c - 2 * p is divisible by no prime smaller than p and if all number between c-2n and c are divisible by any prime smaller than p. The variant p1*p**n requires also that the same c had failed before for p1 (p1 < p) as we already know infinite number of such candidates.

EDIT: I found a smaller example of failure: number 121093190175715194562061 for the prime 79. (which is about ninety times less than for 71) I can't continue by the same algorithm to find smaller examples because all 702612 basic solutions took more than 30 hours for the prime 79 on my laptop.

I also verified it for all candidates smaller than 400000000 (4E10) and for all relevant primes, that no candidate will fail the assertion in the question. Until you have terabytes of memory and thousands years of time, the assertion in the algorithm will pass, because your time complexity is O((n / log(n)) ^2) or very similar.

hynekcer
  • 14,942
  • 6
  • 61
  • 99
  • Great job! I am not surprised that it fails (but I am impressed that you found a failure case). – Daniel Fischer Dec 11 '12 at 04:06
  • Wow! That looks pretty impressive, thanks for the answer! I'll have to read it again at some other time to fully undertand it, but you clearly know what you're doing. :-D – phaz Dec 11 '12 at 06:19
  • 1
    confirmed: `let x1=10986448536829734695346889 in filter (>50) $ map (head.factorize) [x1 .. x1+142]` ==> `[71,67,61,53,59,71]`. – Will Ness Dec 12 '12 at 09:33
  • @WillNess: Does `factorize` guarantee to return factors in increasing order? – j_random_hacker Dec 18 '12 at 15:34
  • @j_random_hacker [the one that I used](http://www.haskell.org/haskellwiki/Testing_primality#Primality_Test_and_Integer_Factorization) does. – Will Ness Dec 18 '12 at 16:05
  • 1
    @WillNess: I thought this was necessary for your comment to be confirmatory evidence, but after thinking about it I see that it's not (though if `factorize` could return factors in arbitrary order, the appearance of factors *above* 71 in the output would not be *disconfirmatory* evidence!) – j_random_hacker Dec 18 '12 at 16:18
  • @j_random_hacker uh oh, you're right. will half-bake something else later. ... wait, I'm confused. will have to re-read the post. – Will Ness Dec 18 '12 at 16:46
  • @j_random_hacker is this additional evidence enough: `let hasFac fs n = any ((==0).rem n) fs in not $ hasFac [2..71] (x1-2)` ==> `True`? ... probably not. hmm. – Will Ness Dec 18 '12 at 17:20
  • @WillNess: I've looked at the `factorize` (well, `primeFactors`) source and it does generate factors in increasing order, so the issue is moot -- *if* hynekcer had been wrong about this huge composite number being reported as a prime then your original code snippet would have output a number >= 71 between the endpoints :) But in any case I think the right test would be that `let hasFac fs n = any ((==0).rem n) fs in all $ map (hasFac [2..67]) [x1+1 .. x1+141]` should output `True`. – j_random_hacker Dec 18 '12 at 18:38
  • 1
    @j_random_hacker change `all` to `and` (or throw away `$ map`) and it does indeed. That's no different from my original test though. *The test is still insufficient* though, good catch! Even if the 2nd "71" number (`x1+142`) is falsely admitted (**FA**) at the `%71` testing step, who's to say it doesn't have some bigger factor which will get it knocked out from the sequence later?? – Will Ness Dec 18 '12 at 22:36
  • @WillNess: Actually the insufficiency you noticed is not the problem I had :) I was objecting to a potential problem with *your code* in that if `factorize` did not guarantee to return factors in increasing order, then your code would be capable of giving false *negatives*, but since the answer happened to be positive there is no problem. What you have noticed is a potential problem with *hynekcer's* approach. – j_random_hacker Dec 18 '12 at 23:02
  • hynekcer: x1's prime factorisation is 71*2758686719*56091440438161, so as @WillNess said, we require that x1 is also skipped over when both 2758686719 and 56091440438161 are being considered. Is that guaranteed to happen? I suspect the part where you talk about the Chinese remainder theorem addresses this, but I don't understand I'm afraid -- it would be great if you could elaborate. – j_random_hacker Dec 18 '12 at 23:05
  • As to your question to OP, only the factors not greater than `sqrt x1` are checked for `x1`, and `2758686719 < 3314581200821 < 56091440438161`. ---- You wrote "I thought this was *necessary* ... but it's not [i.e., *necessary*]", but I read it as "I *thought* this was necessary, ... but it's not [*confirmatory evidence*]". Written communication is notoriously prone to mis-interpretation. So I was scrambling to make sense of *that*... There are other ways to make it not work too. Notice there's only one "67" there. If there were two of them, the 2nd "71" (i.e., x1+142) wouldn't be **FA**. – Will Ness Dec 19 '12 at 06:45
  • @j_random_hacker (contd..) but then the two "61" would be the evidence for the 61. But, what if some number had both, say, 47 and 67 as its factors, and "47" was **FA**, and thus 67 stayed in the game - then the 2nd 71 wouldn't be FA, again. OTOH the only way to make it not be the evidence *for* 71 is to assume some other prime *less than 71* is **FA**... so it wouldn't disprove the overall hypothesis, but it still invalidates my test as evidence *for the 71*... It gets messy. :) – Will Ness Dec 19 '12 at 06:51
  • @WillNess: Thanks for feedback. Without it I wouldn't be motivated to continue. I tested only the narrow question from P.S. about assert. I am going to continue, but the final number (p1**n1 * p2**n2) that completely fails will be much bigger that maybe it will not fit into any computer memory but it still can be easy examined modulo any small number. Are you interested? – hynekcer Dec 19 '12 at 08:35
  • @WillNess: (1) I see how that miscommunication happened, glad we were able to resolve it :) (2) Some algorithms for primes only check for prime factors up to sqrt(x), but *this one doesn't*, so we do still need to check that `%2758686719` and `%56091440438161` both FA x1. (The OP's *corrected* version has the property that the lowest prime factor of any x (which is necessarily < sqrt(x)) will always be used to remove x from the list of possible primes, but if the test for the lowest prime factor for some x is skipped over, the next-highest factor will be tested later.) – j_random_hacker Dec 19 '12 at 09:39
  • @WillNess: "There are other ways to make it not work too. Notice there's only one "67" there. If there were two of them, the 2nd "71" (i.e., x1+142) wouldn't be FA." -- I don't yet understand this, could you explain further? Thanks. (If there were two *consecutive* "67"s then it seems to me that the second "67" might have been FA'd itself, which would mean the "71" would *not* be FA'd because it would be the "67", rather than the "71", that would get skipped -- is that what you mean?) – j_random_hacker Dec 19 '12 at 09:44
  • @hynekcer I'm not sure if I'm interested enough to justify your hard work. :{ It's been amazing enough as it is. btw I ***can't*** confirm your new number, `x0=121093190175715194562061`, for `79`. All the numbers between (not including) `x0` and `x0+79` indeed have LPF (lowest prime factor) less than 79, but x0+79 has `[2,3,5,79]` as its first PFs, and `LPF(x0+80) > 79`. – Will Ness Dec 19 '12 at 09:47
  • @j_random_hacker yes, that's it. The 2nd "67" would get FA'd, and then when we come to the 71's step, the sequence is: "71","67","71" . So 67 is again FA'd, and 2nd "71" *is* tested and discarded. ... wait, does this prove it for *all bigger factors* ...? – Will Ness Dec 19 '12 at 09:49
  • @j_random_hacker ... I guess not (to the last question) - because it can be *out of sync*: testing for bigger F, we could have F,bigger_ones,[67,F] .... so the FA'd 67 would indeed be eliminated by this F-test, it seems. – Will Ness Dec 19 '12 at 10:00
  • @hynekcer also, `[x0-1,x0-2..x0-9]` have LPFs `< 79`, but `LPF(x0-10) > 79`. – Will Ness Dec 19 '12 at 10:42
  • @WillNess: The best I can so far say with confidence is that if any composite numbers are misreported as prime then there must be a lowest such number, x, and that x must survive all passes p for p a prime factor of x. (If we drop the restriction that x be lowest-possible, then it may be required to pass some additional *non-prime* passes as well!) *In particular it is necessary (but not sufficient) that x survives the pass where p is the largest prime factor of x*, which implies that there must be some y < x such that y's largest factor is also p (therefore y - x - kp for some k). – j_random_hacker Dec 19 '12 at 10:50
  • To make it a sufficient condition, we further require that both x and y must have survived all passes for their own prime factors < p, and that no number in the range [y+1, ..., x-1] survives. From this we can infer that if there are dy distinct prime factors in y, in each of the dy-1 passes where the prime factor q being eliminated (it's called p in the code, but I'll call it q here because I've stupidly already used the name p for the largest factor in x!) is a factor of y and q < p, y must be "protected" by an odd number of immediately-preceding values z1, z2, ..., z_(2k+1) < y... – j_random_hacker Dec 19 '12 at 11:12
  • ... that are each divisible by q and that have not already been eliminated. (An odd number because every even-numbered number will be correctly eliminated.) And likewise for x. – j_random_hacker Dec 19 '12 at 11:15
  • @WillNess: The OP's assert() triggers whenever *any* factor of a number is mistakenly skipped over, which is a sufficient but stronger-than-necessary condition for the algo to contain a bug: for a composite number x to be reported, it must meet the more stringent requirement that *all* its factors get skipped over. If hynekcer has indeed proven that the OP's assert() cannot trigger for any prime < 71, then we don't have to worry about the numbers in between x1 and x1+142: they are guaranteed to have been removed, and the assert() will trigger where he says. But... – j_random_hacker Dec 19 '12 at 14:59
  • ... *this does not imply a bug*, because x1's other 2 (much higher) factors might yet knock one or both numbers out before they actually get reported. (And now I'll be quiet for a while :-P) – j_random_hacker Dec 19 '12 at 14:59
  • @j_random_hacker that's some complex stuff, for sure. :) – Will Ness Dec 19 '12 at 16:06
4

Your observation seems to be accurate, which is quite a good catch.

I suspect the reason that it works, at least in some cases, is because composite numbers are actually factored into multiple primes. So, the inner loop may miss the value on the first factor, but it then picks it up on a later factor.

For a small'ish "n", you can print out values of the list to see if this is what is happening.

This method of finding primes, by the way, is based on the Sieve of Eratothenes. It is possible when doing the sieve that if "c" is a multiple of "p", then the next value is never a multiple of the same prime.

The question is: are there any cases where all values between p*x and p*(x+1) are divisible by some prime less than p and p*x+1). (This is where the algorithm would miss a value and it would not be caught later.) However, one of these values is even, so it would be eliminated on round "2". So, the real question is whether there are cases where all values between p*x and p*(x+2) are divisible by numbers less than p.

Off hand, I can't think of any numbers less than 100 that meet this condition. For p = 5, there is always a value that is not divisible by 2 or 3 between two consecutive multiples of 5.

There seems to be a lot written on prime gaps and sequences, but not so much on sequences of consecutive integers divisible by numbers less than p. After some (okay, a lot) of trial and error, I've determined that every number between 39,474 (17*2,322) and 39,491 (17*2,233) is divisible by an integer less than 17:

39,475  5
39,476  2
39,477  3
39,478  2
39,479  11
39,480  2
39,481  13
39,482  2
39,483  3
39,484  2
39,485  5
39,486  2
39,487  7
39,488  2
39,489  3
39,490  2

I am not sure if this is the first such value. However, we would have to find sequences twice as long as this. I think that is unlikely, but not sure if there is a proof.

My conclusion is that the original code might work, but that your fix is the right thing to do. Without a proof that there are no such sequences, it looks like a bug, albeit a bug that could be very, very, very rare.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Yeah I'm not sure how to prove this yet. I have a deleted answer that's basically like user448810's which is wrong on further reflection. – Kenan Banks Dec 06 '12 at 17:14
  • "are there any cases where all values between p*x and p*(x+1) are divisible by some prime less than p?" - There shouldn't be, since all multiples of all primes less than p have been removed. – phaz Dec 06 '12 at 19:07
  • But 39,474 would be dropped when we are checking for % 2 == 0, this example would never show up in OPs problem. – Akavall Dec 06 '12 at 19:39
  • @Akavall . . .Duh! I got lost in the details, missing the most obvious point. Thank you for pointing that out. – Gordon Linoff Dec 06 '12 at 19:47
  • 1
    At p=13, the number 701*13=9113 is still in the 'primes' list and is being removed. At this point, all numbers from 9114 to 9126 have already been removed; the next number present is 9113+14. – Armin Rigo Dec 06 '12 at 22:04
  • 2
    @ArminRigo . . . But 9127 is prime, so skipping over it doesn't cause a problem. – Gordon Linoff Dec 06 '12 at 22:11
  • 1
    I do not agree with "the inner loop may miss the value on the first factor, but it then picks it up on a later factor." It mostly helps but not absolute. If we find a prime `p` and a power `n` (probably big number) such that `p ** n - 2*p` is not divisible by any prime smaller than p and all numbers between `p ** n - 2p` and `p ** n` are divisible by some these small primes, then it fails and no more factors are present for `p**n` and it definitely fails. I found [failing candidates for p=71](http://stackoverflow.com/a/13811371/448474) that is the smallest prime for which a candidate can fail. – hynekcer Dec 11 '12 at 00:21
1
  • Given two numbers n, m in the consecutive sequence of possible primes such that n and m are not divisible by the last divisor p, then m - n < p
  • Given q (the next higher divisor) > p, then if n is divisible by q, then the next number divisible by q is n + q > n + p > m so m should be skipped in the current iteration for divisibility test

    Here n = primes[c] 
    
    m = primes[c + 1], i.e. primes[c] after primes.pop(c)
    
    p = primes[p]
    q = primes[p+1] 
    
Abhijit
  • 62,056
  • 18
  • 131
  • 204
1

This program does not work correctly, i.e., it incorrectly reports a composite number as prime. It turns out to have the same bug as a program by Wirth. The details may be found in Paul Pritchard, Some negative results concerning prime number generators, Communications of the ACM, Vol. 27, no. 1, Jan. 1984, pp. 53–57. This paper gives a proof that the program must fail, and also exhibits an explicit composite which it reports as prime.

oluckyman
  • 126
  • 3
0

Here is an idea:

Triptych explained1 that the next number after c cannot be c + p, but we still need to show that it can also never be c + 2p.

If we use primes = [2], we can only have one consecutive "non-prime", an number divisible by 2.

If we use primes = [2,3] we can construct 3 consecutive "non-primes", a number divided by 2, a number divided by three, and a number divided by 2, and they cannot get the next number. Or

2,3,4 => 3 consecutive "non-primes"

Even though 2 and 3 are not "non-primes" it is easier for me to think in terms of those numbers.

If we use [2,3,5], we get

2,3,4,5,6 => 5 consecutive "non-primes"

If we use [2,3,5,7], we get

2,3,4,5,6,7,8,9,10 => 9 consecutive "non-primes"

The pattern emerges. The most consecutive non-primes that we can get is next prime - 2.

Therefore, if next_prime < p * 2 + 1, we have to have at least some number between c and c + 2p, because number of consecutive non-primes is not long enough, given the primes yet.

I don't know about very very big number, but I think this next_prime < p * 2 + 1 is likely to hold very big numbers.

I hope this makes sense, and adds some light.


1 Triptych's answer has been deleted.

Community
  • 1
  • 1
Akavall
  • 82,592
  • 51
  • 207
  • 251
0

This doesn't provide a remotely conclusive answer, but here's what I've tried on this:

I've restated the required assumption here as (lpf stands for Least Prime Factor):

For any composite number, x, where:
    lpf(x) = n
There exists a value, m, where 0 < m < 2n and:
    lpf(x+m) > n

It can be easily demonstrated that values for x exist where no composite number (x+m), exists to satisfy the inequality. Any squared prime demonstrates that:

lpf(x) = x^.5, so x = n^2
n^2 + 2n < (n + 1)^2 = n^2 + 2n + 1

So, in the case of any squared prime, for this to hold true, there must be a prime number, p, present in the range x < p < x + 2n.

I think that can be concluded given the asymptotic distribution of squares (x^.5) compared to the the Prime Number Theorem (asymptotic distribution of primes approx. x/(ln x)), though, really, my understanding of the Prime Number Theorem is limited at best.

And I have no strategy whatsoever for extending that conclusion to non-square composite numbers, so that may not be a useful avenue.

I've put together a program testing values using the above restatement of the problem.

Test this statement directly should remove any got-lucky results from just running the algorithm as stated. By got-lucky results, I'm referring to a value being skipped that may not be safe, but that doesn't turn up any incorrect results, due to a skipped value not being divisible by the number currently being iterated on, or being picked up by subsequent iterations. Essentially, if the algorithm gets the correct result, but either doesn't find the LEAST prime factor of each eliminated value, or doesn't rigorously check each prime result, I'm not satisfied with it. If such cases exist, I think it's reasonable to assume that cases also exist where it would not get lucky (unusual though they may be), and would render an incorrect result.

Running my test, however, shows no counter-examples in the values from 2 - 2,000,000. So, for what it's worth, values from the algorithm as stated should be safe up to, at least, 2,000,000, unless my logic is incorrect.

That's what I have to add. Great question, Phazyck, had fun with it!

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
-2

If prime p divides candidate c, then the next larger candidate that is divisible by p is c + p. Therefore, your original code is correct.

However, it's a rotten way to produce a list of primes; try it with n = 1000000 and see how slow it gets. The problem is that you are performing trial division when you should be using a sieve. Here's a simple sieve (pseudocode, I'll let you do the translation to Python or another language):

function primes(n)
    sieve := makeArray(2..n, True)
    for p from 2 to n step 1
        if sieve[p]
            output p
            for i from p+p to n step p
                sieve[i] := False

That should get the primes less than a million in less than a second. And there are other sieve algorithms that are even faster.

This algorithm is called the Sieve of Eratosthenes, and was invented about 2200 years ago by a Greek mathematician. Eratosthenes was an interesting fellow: besides sieving for primes, he invented the leap day and a system of latitude and longitude, accurately calculated the distance from Sun to Earth and the circumference of the Earth, and was for a time the Chief Librarian of Ptolemy's Library in Alexandria.

When you are ready to learn more about programming with prime numbers, I modestly recommend this essay at my blog.

user448810
  • 17,381
  • 4
  • 34
  • 59
  • 2
    "If prime p divides candidate c, then the next larger candidate that is divisible by p is c + p. Therefore, your original code is correct." His primes are constantly updated, so it is not clear that this is the case, even though it might be the case. – Akavall Dec 06 '12 at 17:06
  • @Akavall . . . The original code is removing numbers from the list, so the next value is not c+1. See the comment by femtoRgon and in my answer. – Gordon Linoff Dec 06 '12 at 17:09
  • 2
    This answer is wrong. It's true that the code works, but not that the "next largest candidate is c+p". As Akavall notes, the prime candidate list is constantly being updated, and the next candidate in the list after finding a prime is in fact NEVER c+p, since that number would have been removed when filtering multiples of 2. For instance when primes[p] = 5 and primes[c] = 25, we know that primes[c] + primes[p] == 30 is NOT in the list. – Kenan Banks Dec 06 '12 at 17:15
  • 1
    I stand by my answer. Notice that I am referring to actual numbers p and c, not primes[p] and primes[c] that the original poster was using. – user448810 Dec 06 '12 at 17:16
  • @GordonLinoff, the next number is primes[c+1], after OP removed even numbers, OP has [2,3,5,7,9,11,13], when 9 is removed, primes list shrinks, even though c increments, as a result, the code never looks at 11, and goes straight to 13. – Akavall Dec 06 '12 at 17:18
  • @Triptych, "the next candidate in the list after finding a prime is in fact NEVER c+p, since that number would have been removed when filtering multiples of 2". I think that's it! – Akavall Dec 06 '12 at 17:24
  • @Akavall Can a similar conclusion necessarily be drawn about c+2p though? – femtoRgon Dec 06 '12 at 17:34
  • @femtoRgon, I don't know :). This seems to be a missing part still. – Akavall Dec 06 '12 at 18:12
  • I don't get why you guys think this is wrong at all.. seems very robust to me.. you start off with all trues, and you're knocking numbers that can be 'factored down' off the list by setting them equal to false. And you do it for each number that hasn't been knocked off yet. – sksallaj Dec 06 '12 at 20:08
  • 2
    @sksallaj- wrong. You're not "setting them to false", which is how this algorithm normally works, you're actually removing the composite number from the list, changing the list's length. – Kenan Banks Dec 06 '12 at 20:22