1

I've been trying to make an efficient prime generator that successfully utilises wheel factorisation to be faster than a basic sieve of Eratosthenes implementation but my code ends up being a lot slower due to the overhead cost. I want to know why my implementation ends up being so slow, if there is a way to improve it, and how other prime generators end up being much faster.

Here is my code, which is quite long due to having to hard-code the modulo 30 wheel

import numpy as np
def wheel_factorisation(number):
    wheel = [1, 7, 11, 13, 17, 19, 23, 29]
    difference = [2, 6, 4, 2, 4, 2, 4, 6]
    squareroot = math.isqrt((number // 30) * 8)
    isprime = np.ones(squareroot,dtype=bool)
    isprime[0] = False

    multiples = []
    primes = [2,3,5]

    factor = 7
    index = 1
    DIVindex = 0

    while factor * factor <= number:
        if isprime[DIVindex * 8 + index]:
            primes.append(factor)
            if index == 0:  #1%30
                multiples.append([8*DIVindex*(factor+1), 0, 0, DIVindex])
                #offset = [0, 1, 2, 3, 4, 5, 6, 7]
                offset = np.ones(8,dtype = int)
                #DIVoffset = [1, 0, 0, 0, 0, 0, 0, 0]
                    
            if index == 1:  # 7%30
                multiples.append([8*DIVindex * (factor + 7) + 13, 1, 1, DIVindex])

                #offset = [1, 5, 4, 0, 7, 3, 2, 6]
                offset = np.array([3,12,7,4,7,4,7,12])
                #DIVoffset = [1, 1, 1, 1, 0, 1, 1, 1]

            if index == 2:  #11%30
                multiples.append(
                    [8*DIVindex * (factor + 11) + 32, 2, 2, DIVindex])
                #offset = [2, 4, 0, 6, 1, 7, 3, 5]
                offset = np.array([5,18,12,6,11,6,12,18])
                #DIVoffset = [1, 2, 2, 0, 2, 0, 2, 2]

            if index == 3:  #13%30
                multiples.append(
                    [8*DIVindex * (factor + 13) + 45, 3, 3, DIVindex])
                #offset = [3, 0, 6, 5, 2, 1, 7, 4]
                offset = np.array([7,21,14,7,13,7,14,21])
                #DIVoffset = [1, 3, 1, 1, 2, 1, 1, 3]

            if index == 4:  #17%30
                multiples.append(
                    [8*DIVindex * (factor + 17) + 77, 4, 4, DIVindex])
                #offset = [4, 7, 1, 2, 5, 6, 0, 3]
                offset = np.array([9,27,18,9,19,9,18,27])
                #DIVoffset = [1, 3, 3, 1, 2, 1, 3, 3]

            if index == 5:  #19%30
                multiples.append(
                    [8*DIVindex * (factor + 19) + 96, 5, 5, DIVindex])
                #offset = [5, 3, 7, 1, 6, 0, 4, 2]
                offset = np.array([11,30,20,10,21,10,20,30])
                #DIVoffset = [1, 4, 2, 2, 2, 2, 2, 4]

            if index == 6:  #23%30
                multiples.append(
                    [8*DIVindex * (factor + 23) + 141, 6, 6, DIVindex])
                #offset = [6, 2, 3, 7, 0, 4, 5, 1]
                offset=np.array([13,36,25,12,25,12,25,36])
                #DIVoffset = [1, 5, 3, 1, 4, 1, 3, 5]

            if index == 7:
                multiples.append(
                    [8*DIVindex * (factor + 29) + 224, 7, 7, DIVindex])
                #offset = [7, 6, 5, 4, 3, 2, 1, 0]
                offset = np.array([15,47,31,15,31,15,31,47])
                #DIVoffset = [1, 6, 4, 2, 4, 2, 4, 6]

            while multiples[-1][0] < squareroot:
                isprime[multiples[-1][0]] = False
                multiples[-1][1] += 1
                if multiples[-1][1] == 8:
                    multiples[-1][1] = 0
                    
                multiples[-1][0] += offset[multiples[-1][1]] + 8*DIVindex*difference[multiples[-1][1]]

        index += 1
        if index == 8:
            DIVindex += 1
            index = 0
        factor += difference[index]
    prime = [DIVindex, index]
    while prime[0] * 8 + prime[1] < squareroot:
        if isprime[prime[0] * 8 + prime[1]]:
            primes.append(prime[0] * 30 + wheel[prime[1]])
        prime[1] += 1
        if prime[1] == 8:
            prime[1] = 0
            prime[0] += 1
    low = squareroot
    limit = low + low
    while low <= (number * 8) // 30:
        isprime = np.ones(squareroot,dtype=bool)
        if limit > (number * 8) // 30:
            limit = (number * 8) // 30
        factor = 0
        while factor < len(multiples) and primes[factor +
                                                 3] <= math.isqrt(number):
            index = multiples[factor][2]
            DIVindex = multiples[factor][3]
            if multiples[factor][0] < limit:    
                if index == 0:  #1%30
                    #offset = [0,1,2,3,4,5,6,7]
                    offset = np.ones(8,dtype = int)
                    #DIVoffset = [1, 0, 0, 0, 0, 0, 0, 0]

                if index == 1:  # 7%30
                    #offset = [1, 5, 4, 0, 7, 3, 2, 6]
                    offset = np.array([3,12,7,4,7,4,7,12])
                    #DIVoffset = [1, 1, 1, 1, 0, 1, 1, 1]

                if index == 2:  #11%30
                    #offset = [2, 4, 0, 6, 1, 7, 3, 5]
                    offset = np.array([5,18,12,6,11,6,12,18])
                    #DIVoffset = [1, 2, 2, 0, 2, 0, 2, 2]

                if index == 3:  #13%30
                    #offset = [3, 0, 6, 5, 2, 1, 7, 4]
                    offset = np.array([7,21,14,7,13,7,14,21])
                    #DIVoffset = [1, 3, 1, 1, 2, 1, 1, 3]

                if index == 4:  #17%30
                    #offset = [4, 7, 1, 2, 5, 6, 0, 3]
                    offset = np.array([9,27,18,9,19,9,18,27])
                    #DIVoffset = [1, 3, 3, 1, 2, 1, 3, 3]

                if index == 5:  #19%30
                    #offset = [5, 3, 7, 1, 6, 0, 4, 2]
                    offset = np.array([11,30,20,10,21,10,20,30])
                    #DIVoffset = [1, 4, 2, 2, 2, 2, 2, 4]

                if index == 6:  #23%30
                    #offset = [6, 2, 3, 7, 0, 4, 5, 1]
                    offset=np.array([13,36,25,12,25,12,25,36])
                    #DIVoffset = [1, 5, 3, 1, 4, 1, 3, 5]

                if index == 7:  #29%30
                    #offset = [7, 6, 5, 4, 3, 2, 1, 0]
                    offset = np.array([15,47,31,15,31,15,31,47])
                    #DIVoffset = [1, 6, 4, 2, 4, 2, 4, 6]
                while multiples[factor][0] < limit:    
                    isprime[multiples[factor][0]-low] = False
                    multiples[factor][1] +=1
                    if multiples[factor][1]==8:
                        multiples[factor][1]=0
                        
                    multiples[factor][0]+= 8*DIVindex * difference[
                        multiples[factor][1]]
                    multiples[factor][0] += offset[multiples[factor][1]]
                    
            factor += 1
        while prime[0]*8+prime[1]< limit:
            if isprime[prime[0]*8+prime[1] - low]:
                primes.append(prime[0] * 30 + wheel[prime[1]])
            prime[1] += 1
            if prime[1] == 8:
                prime[1] = 0
                prime[0] += 1
        low += squareroot
        limit += squareroot
    return primes

The code is quite long, but really the part that I'm interested in is the while loop to cross off primes as this part takes up the majority of run time:

                    while multiples[factor][0] < limit:    
                    isprime[multiples[factor][0]-low] = False
                    multiples[factor][1] +=1
                    if multiples[factor][1]==8:
                        multiples[factor][1]=0
                        
                    multiples[factor][0]+= 8*DIVindex * difference[
                        multiples[factor][1]]
                    multiples[factor][0] += offset[multiples[factor][1]]

and the code that adds the primes from the array:

            while prime[0]*8+prime[1]< limit:
            if isprime[prime[0]*8+prime[1] - low]:
                primes.append(prime[0] * 30 + wheel[prime[1]])
            prime[1] += 1
            if prime[1] == 8:
                prime[1] = 0
                prime[0] += 1

[This script ends up taking around 7.3 seconds to generate primes up to 1 million, but a simple sieve of Eratosthenes can do this in only 2 seconds, with what is technically a modulo 2 wheel (and also missing out the factor 2 but I decided to leave it out for best-case speed):]

If I get rid of the segmentation it ends up taking around 0.05 seconds, but there are other approaches below that end up being faster.

def normal_sieve(number):
primes = np.ones(number,dtype=bool)
for factor in range(3,math.isqrt(number)+1,2):
    if primes[factor - 1]:
        for i in range(factor*factor,number+1,2*factor):
                primes[i - 1] = False
return [i + 1 for i in range(2, number, 2) if primes[I]]

And this prime sieve I found somewhere can do this in just 0.003 seconds, which is a huge difference even though the other implementations also use numpy arrays:

def primesfrom2to(n):
    """ Input n>=6, Returns a array of primes, 2 <= p < n """
    sieve = numpy.ones(n//3 + (n%6==2), dtype=bool)
    for i in range(1,int(n**0.5)//3+1):
        if sieve[i]:
            k=3*i+1|1
            sieve[       k*k//3     ::2*k] = False
            sieve[k*(k-2*(i&1)+4)//3::2*k] = False
    return numpy.r_[2,3,((3*numpy.nonzero(sieve)[0][1:]+1)|1)]

There is also a library called primesieve, which manages to count primes up to 100 billion within around a minute, and I'm really not sure where it gets its speed, besides being made in c++ instead of python. Why is my version so much slower than these, and how do I get a wheel factorisation method to work?

  • You can replace the second `if index == 0:` sequence with a single statement by initializing an 8x8 array before you begin. – Tim Roberts Jul 03 '22 at 20:48
  • 1
    Your premise seems to be that a "wheel factorization"-based method will be faster than the Sieve of E for finding the first N primes. What is your basis for this belief? – President James K. Polk Jul 03 '22 at 20:48
  • 1
    And at the risk of being a wet blanket, you understand that what you're doing has no practical value, right? A sieve approach is naturally limited to the size of your memory, and we already know how to make primes up to the size of your memory. – Tim Roberts Jul 03 '22 at 21:02
  • @TimRoberts there are even ways to go beyond your memory size: https://stackoverflow.com/q/62899578/5987 – Mark Ransom Jul 03 '22 at 21:27
  • yes, I am aware that this may not be practical or worth it to do something like this, but I am basically looking into finding an approach that works, since as it is checking for fewer prime numbers it should theoretically be faster, and there are probably already a few implementations out there that actually do that. The theory behind using segments is that I would be able to swap out the 'square root' variable for the cache size so the segments will always fit into the cache. – creeper_the_cat Jul 03 '22 at 21:32
  • I just realised that I really shouldn't have segmented the sieve when measuring the time, it ends up being a lot faster if I make square root = number, and faster than the basic sieve. However, the primesfrom2to function is still much, much faster than my function . What is it doing that makes it that fast? – creeper_the_cat Jul 03 '22 at 21:48
  • How many different values can `index` be equal to in a single loop iteration, and how many times do you check the value of `index`? – chepner Jul 03 '22 at 23:39
  • The funny thing about a sieve algorithm is that the "checking" part takes practically no time, so anything that complicates the algorithm might not actually be a net win. – Tim Roberts Jul 04 '22 at 02:43
  • Here you can find a wheel sieve base 6 and base 30 https://stackoverflow.com/questions/65969131/time-complexity-of-sieve-algorithm – user140242 Jul 04 '22 at 19:00
  • That solution does look interesting but it is fairly difficult to actually understand what it is doing. It seems to be using two separate lists for 1%6 and 5%6 but I can't really understand the list splicing. – creeper_the_cat Jul 04 '22 at 19:08
  • yes I use two vectors for 1mod6 and 5mod6 then the operating principle is the same as the traditional sieve instead of starting from p * p. we start from 6 * i * i. in fact (6 * i-1) * (6 * i + 1) = (- 1 + 6 * (6 * i * i)) – user140242 Jul 04 '22 at 19:16
  • I've never seen lists edited like that, probably because it ends up looking quite messy. Might be useful trying something like that in my code to make it faster. What is the second function doing? I'm not very familiar with numpy so I'm not really sure what it is doing. – creeper_the_cat Jul 04 '22 at 20:35
  • With numpy there are functions that simplify and speed up operations with vectors. The second function is equal to that modulo 6 but uses an array of 8 lines instead of 2. Here you find the easier to understand version in numpy https://stackoverflow.com/questions/53818194/goldbach-conjecture-in-python/69695020#69695020 – user140242 Jul 04 '22 at 21:03
  • I'm still stuck trying to understand how the modulo 30 function works. What is the C1 array for? How is it crossing off the primes? And what is that last line doing to create the output array? – creeper_the_cat Jul 24 '22 at 09:59

0 Answers0