Better algorithm on prime numbers

Question

I'm working on a program which will found the nth. prime number. For example, By listing the first six prime numbers: 2, 3, 5, 7, 11 and 13, we can see that the 6th prime is 13. I'm trying to making an algorithm, like, if I want to see 50th prime, I will add 1 to ends of the range() function. I'm using this algorithm to find primes at the moment;

cnt = 1
print (2)
for x in range(3,40,2):
    div = False
    for y in range(2,round(x**0.5)+1):
        if x%y == 0:
            div = True
    if div == False:
        print (x)
        cnt += 1

print ("\nThere is {} prime numbers.".format(cnt))

You see that, I put 40. But I want to put n there, so for example, untill reaching the 50th prime, add +1 to n. But it's not going to like that, if I tried something like;

cnt = 1
n = 40 #example
while cnt<50:
    for x in range(3,n,2):
        #codes
    if div == False:
        n += 1

I thought when the program finds a prime, it will add +1 to n and while loop will process untill find the 50th prime. But it didn't, primes are wrong if I use this one also, nothing relevant what I want to do.

How to make this algorithm, obviously changing the last element of range() function does not working.
Is there better/elegant algorithm/way? If I want to find 200.000th prime, I need faster codes.

Edit: I was working with lists first but, I got MemoryError all the time when working with big numbers. So I pass that and using a variable that counting how much primes are there cnt.

In terms of better algorithm, you can use a [Sieve of Eratosthenes](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) inside the range given in [this math.SE answer](http://math.stackexchange.com/a/1259) — Sinkingpoint, Feb 03 '15 at 23:46
@Quirliom Thanks for your comment, _Create a list of consecutive integers from 2 through n: (2, 3, 4, ..., n)._ as you see on this rule, _create a list_ , which will throw a MemoryError when working with big numbers, that's why I pass working with lists. — GLHF, Feb 03 '15 at 23:51
You shouldn't be hitting memory errors at <10**8 - how much free ram do you have? — user3467349, Feb 03 '15 at 23:53
@user3467349 I always got that error if I append a big number to a list, I don't know why actually. Around 4-5GB I think. — GLHF, Feb 03 '15 at 23:54
It's because of https://docs.python.org/2/library/sys.html#sys.maxsize — user3467349, Feb 03 '15 at 23:56
I assume there's a `cnt += 1` in your second code snippet that has been left out? — Joel, Feb 03 '15 at 23:57
@Joel I didn't write that on second code because I just want to show what was my logic, but `cnt` counting how much primes are there — GLHF, Feb 03 '15 at 23:58
17984? it should be fast to do even iterating with small numbers. — user3467349, Feb 04 '15 at 00:04
Just a quick note that I updated my answer to include a test that I neglected. Now it stops testing once it reaches a prime greater than sqrt(candidate). This is a test you had in your code that I neglected to include. — Joel, Feb 04 '15 at 13:27

Joel · Accepted Answer · 2015-02-04T21:11:01.213

3

Here is a much faster version

primes = []
primecount = 0

candidate = 2
while primecount<50:
    is_prime = True
    for prime in primes:
        if candidate%prime == 0:
            is_prime = False
            break
        elif candidate < prime**2:
            break
    if is_prime:
        primes.append(candidate)
        primecount += 1

    candidate += 1
print primes[-1]

note small edit adding the candidate<prime**2 test that OP included but I neglected initially.

Your code is going to be very slow for several reasons. If 2 divides a number you know it's not prime, but you're still checking whether 3 or 4 or 5... divides it. So you can break out as soon as you know it's not prime. Another major issue is that if 2 does not divide a number, there's no reason to check if 4 divides it as well. So you can restrict your attention to just checking if the primes coming before it divide it.

In terms of run time:

enter image description here

edited Feb 04 '15 at 21:11

answered Feb 04 '15 at 00:05

Joel

22,598
6
69
93

Nice algorithm, this is what I'm looking for. Thanks for the answer – GLHF Feb 04 '15 at 00:10
It's the algorithm @Quirliom suggested you use. I just gave a simple implementation of it. – Joel Feb 04 '15 at 00:15
4

This is not an implementation of the Sieve of Eratosthenes. It is an optimised version of the OPs trial division. – lvc Feb 04 '15 at 00:42
1

Also candidate can be +=2 to skip the evens, and curiously for prime in primes is actually a good bit slower than `for y in range(2,int(x**0.5+1))` – user3467349 Feb 04 '15 at 00:47
choice of variable name is quiet misleading since you're using "prime" to iterate over numbers and for boolean value. maybe is_prime is better choice for boolean. – Melug Feb 04 '15 at 06:18
@Melug Thanks - hadn't noted that I was using `prime` in two ways. It does work, but you're very right. Correcting it now. – Joel Feb 04 '15 at 08:06
@lvc no it is actually worse than OP's code. the OP complexity is _N^1.5_, for the upper limit of _N_; this code is _N^2/(log N)^2_. but yes, it is not SoE, whose complexity is _N log (log N)_. Empirical orders of growth [can be measured](https://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth). – Will Ness Feb 04 '15 at 09:31
@WillNess Rather than offering oblique criticism like this, why not say "hey Joel, you neglected to add a break if the prime you're testing is greater than sqrt(candidate)"? I thought I'd included it, but you're right, I hadn't. Please, do tell me what the scaling is now that it's been correctly implemented. – Joel Feb 04 '15 at 12:53
@Joel Still this is not `the sieve of eratosthenes`, please modify your answer to reflect this. – sarveshseri Feb 04 '15 at 13:32
@Joel I have no way of knowing whether it is something new for you to discover, right? So it was necessarily oblique, but not a criticism at all, just an observation. :) The complexity of the new code with the added `sqrt` limit is _N^1.5/(log N)^2_, AFAICR. So no, that was not a small edit, not by any measure. -- It would be _very_ interesting to see on that graph the run time for the original code (before the edit) too. Also, it might aid the visual comprehension of the graph if you would divide each line's data by _N_ and scale each by some additional constant so they start at same point. – Will Ness Feb 04 '15 at 17:14
I suggest the additional scaling as maybe it will allow you to enhance the vertical resolution of the graph (or maybe it won't be necessary). I'd sure upvote this answer then. – Will Ness Feb 04 '15 at 17:21
@WillNess - You're right that it was not a small edit in terms of it's impact, but it was quite obviously a minor edit in that 1) the OP had it, so it would be surprising if I hadn't intended to include it and 2) it's an easy, tiny change. This obviously improves this algorithm for the OP and any future people using the approach. Keeping such things "secret" so as to not give away the answer isn't how I see our role here. I'll add the requested scaling to the plot (soon?), but probably leave the original off (not willing to wait, as you are correct that my original bug slowed things down) – Joel Feb 04 '15 at 21:09
@Joel thanks for the edit, and I upvoted; I think now (sorry for that :) ) that better thing to do would be to divide by the actual running time, so *it* will be a horizontal line. Also, no need for N^(3/2), what's needed is N^(3/2)/(log N)^2. I've got a feeling this new line will become horizontal too, at some point. This would mean the run time is the same, up to a constant factor. This might be a nice alternative to calculating the power coefficients on consecutive ranges (as in WP article), as the constant factors find their expression too, here, unlike with that method. – Will Ness Feb 05 '15 at 10:52

score -1 · Answer 2 · answered Feb 04 '15 at 00:28

First off, for backwards compatibility with python 2, I added an int() around your rounding of root x.

From what I understand of your question, you are looking for something like this:

cnt = 1
maximum=50 #Specify the number you don't want the primes to exceed
print (2)
n=20 #highest number 
while cnt<maximum: #
    for x in range(3,n,2): #using "n" as you hoped
        div = False
        for y in range(2,int(round(x**0.5)+1)):
            if x%y == 0:
                div = True
        if div == False:
            if x<maximum: #we only want to print those up to the maximum
                print str(x)
            else: #if it is greater than the maximum, break the for loop
                break
            cnt += 1
    break #break the while loop too.
print ("\nThere are {} prime numbers.".format(cnt))

This gave me the correct primes. As for better/more elegant solutions. If by better you want speed, use a library that has an optimized, or look at this for an example of a fast program. Elegance is subjective, so I will leave that to the rest of the internet.

Better algorithm on prime numbers

2 Answers2