How can one verify the proper operation of a sieve close to 2^64?

Question

Small primes - up to about 1,000,000,000,000 - are readily available from various sources. The Prime Pages (utm.edu) have lists for the first 50 million primes, primos.mat.br goes up to 10^12, and programs like the one available at primesieve.org go even higher.

However, when it comes to numbers close to 2^64 there's only the ten primes mentioned on the page Primes just less than a power of two at primes.utm.edu and that seems to be it.

The primality test found there refuses to work on numbers that don't fit a double, others - elsewhere - fail to refuse and just print trash. The primesieve.org program refuses to work with numbers that aren't at least some 40 billion below 2^64, which doesn't exactly inspire confidence in the quality of their coding. The same result everywhere: nada, zilch, niente.

The cogs and gears of sieves start creaking around the 2^62 mark, and close to 2^64 there's hardly a cog that doesn't creak loudly threatening to break apart. Hence the need for testing the implementation is greatest where verification is most difficult, because of the scarcity/absence of reliable reference data. The primesieve.org program seems to be the only one that works at least up to 2^63 or thereabouts, but I don't trust it too much because of the above-mentioned issue.

So how then can one verify the proper operation of a sieve close to 2^64? Are there reliable lists somewhere for a million (or ten million or a hundred million) primes just below and above powers of two like 2^64, 2^63 and so on? Or are there reliable (trustworthy, verified, banged-on a lot) programs that yield such sequences or that can verify primes or lists of primes?

Once a sieve has been verified it can be used to produce handy lists with sums/checksums for loads of interesting ranges, but absent such lists the situation seems difficult...

P.S.: I determined the upper limit for the primesieve.org turbo siever to be UINT64_MAX - 10 * UINT32_MAX, or 0xFFFFFFF600000009. That means only the 10 * UINT32_MAX highest primes don't have any reference data at all so far...

Imagine a project on the scale of primesieve.org: developed over decades, a sieve engine will all the trimmings, for all intents and purposes the best 2^64 sieve program in the world, and with a console mode executable so crammed with code that it's almost a Megabyte in size... And yet they prefer trimming the job description from '64-bit sieve' to sort of 'almost' rather than fixing the code. In any case they seem to be more concerned with throughput (timing benchmarks) than anything else. I guess Kim Walisch doesn't need primes in the upper range for his project... — DarthGizka, Nov 07 '14 at 22:15
The self-test contains only one single test in the upper half of the range, which is the one with 2^32 numbers starting at 10^19. Guess what, most bugs occur only in the higher regions of that range. Even Silva's crappy code managed 10^19 and crashed only when tried sieving at 18446744025000000000 (which is a few multiples of 10^9 below the cap for primesieve.org). The overall quality of the project is excellent, with all the bells and whistles. I trust them to be the fastest sievers on the planet, but I don't trust their output close to the upper end of their range. — DarthGizka, Nov 07 '14 at 22:22
Not to mention that merely counting primes (set or cleared bitmap bits) uncovers only some kinds of bugs while it leaves others undetected. Which is why I asked about [Checksumming large swathes of prime numbers? (for verification)](http://stackoverflow.com/questions/26606355/checksumming-large-swathes-of-prime-numbers-for-verification). — DarthGizka, Nov 07 '14 at 22:25
Bigints: yes, I've considered them, after a fashion. To wit, I switched all index variables in my reference code from uint32_t to uint64_t, which eliminates many classes of potential bugs. The normal code still uses uint32_t index variables, though, in order to run well in 32-bit mode But I can always run it against the reference implementation for checking. My problem now is verifying the reference implementation. And I guess other people may have the same sort of problem, which made the question worth asking here. — DarthGizka, Nov 07 '14 at 22:32
if you just change everything to bigints, there's no possibility for any problem from the bits manipulation, except for the bugs in the bigint code itself, which is supposed to be free from bugs, I assume. So it will only be slow, its complexity nearly the same though (hopefully). Then you can sieve some narrow high range with the both versions and compare outputs, for an increased confidence. Then you could implement prime counting function with bigints, and compare the counts too. Or implement some (proabilistic) primality tests, again with bigints, and test some more segments... — Will Ness, Nov 08 '14 at 12:00
i.e. use the non-sophisticated sieve on bigints as reference impl. you could also roll your own large_int library for large ints capped at 128 bit, or 512 bits or whatever, which should be fairly trivial if we're only concerned with validity and not performance. — Will Ness, Nov 08 '14 at 12:02
Will, good point! Write it up as an answer. You get my vote, and a couple ideas as well... The point being removal of problems by employing superiour firepower^W^Wsufficiently large data types. It doesn't address the question of getting massive amounts of reference data, but it does help with reliability. — DarthGizka, Nov 08 '14 at 12:17
you can also answer your own question. :) feel free to use any idea you got from me here. Ping me if you do, so I get a chance to upvote. — Will Ness, Nov 08 '14 at 12:24
about your post on code review: there's a feature here on SO that you can post a question and an answer to it *simultaneously*. Because asking for help with fixing broken code is on-topic here, I think (I'm no big on lawyering). So you could post them both - the question and your answer to it at the same time - here. There'd be no suspense, but you'd be able to make your point - and help others (to see those bugs you mention). — Will Ness, Nov 08 '14 at 12:53

score 2 · Accepted Answer · answered Nov 06 '14 at 19:37

2

Instead of looking for a pre-computed list, you could compare the output of your sieve to a different sieve. A good sieve, written by Tomás Oliveira e Silva, is available at http://sweet.ua.pt/tos/software/prime_sieve.html.

Another way to test your code is by testing the primality of all numbers your sieve reports as prime (or conversely, testing the non-primality of all numbers your sieve does not report as prime). A good way to do that is the Baillie-Wagstaff test. You can find a good-quality implementation by Thomas R. Nicely at http://www.trnicely.net/misc/bpsw.html.

You might also be interested in Jan Feitsma's tables of pseudoprimes at http://www.janfeitsma.nl/math/psp2/index, which are complete to 2⁶⁴.

answered Nov 06 '14 at 19:37

user448810

17,381
4
34
59

I have researched the three 'ancestors' of the primesieve.org project - [Achim Flammenkamp's prime_sieve.c](http://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html), Tomás Oliveira e Silva's demo code that you linked, and [Kim Walisch's ecprime](http://primzahlen.de/referenten/Kim_Walisch/index2.htm). If I wanted inspiration for writing new sieves then I might look at them again for ideas. But they are proof-of-concept code, no more. – DarthGizka Nov 07 '14 at 21:31
Let's say I invested the time to get them to sieve up to 2^64-1 instead of crashing, what then? I still couldn't trust their output without verifying against something reliable, since their own authors never bothered doing so. They didn't even bother to verify operation in critical corners of the supposed operating range, as evidenced by the crashes. For now I'd just be content to verify the correct operation of my own code, thank you very much. I haven't done ecprime very thoroughly yet, and its quality seems to be much better. But the hen-and-egg problem of verification remains. – DarthGizka Nov 07 '14 at 21:35
Primality testing as a means of checking a sieve's operation has two drawbacks. The first is that the primality test code is more complex than the code to be tested, and hence it would need even more verification before it could be trusted. The second is throughput. Running primality tests on sieve output would uncover only false positives (i.e. composites in the output) but sieve bugs close to the limits of integral types are more apt to result in false negatives (i.e. primes not reported). Trust me. BTDT. Hence the primality tests would have to be run against most odd numbers in the target – DarthGizka Nov 07 '14 at 21:50
range, taking even more time. At the moment I'm using the primesieve.org program to do the bulk of testing and to build up the test infrastructure (experimenting with piping vs. batched operation and so on) but for the really interesting range close to 2^64 I don't have any reference data of any kind whatsoever... – DarthGizka Nov 07 '14 at 21:52
ecprime is nice, but it caps numbers at 2^62. The file `limits` says "0 <= numbers <= 2^62 = 4.61 * 10^18". In any case, for things like verification it is much better if executables are built and verified by their authors, rather than having to be built on the target system without even makefiles and stuff. – DarthGizka Nov 07 '14 at 23:04
1

For verification it doesn't much matter how fast it runs. Write the simplest possible sieve you can, then compare its output to your fancy fast sieve. Or write a simple Miller-Rabin tester, and test every odd within 10^10 of your limit. Or download gp/pari and get it to give you a list of primes near your limit. Or ask for help at mersenneforum.org. – user448810 Nov 08 '14 at 00:52
Yes and no. 'Yes' in the sense that a couple days' worth of trial division would cover at least *some* ground and the result would allow the uncovering of *some* bugs. 'No' in the sense that trial division is way too slow to cover anywhere near **sufficient** ranges. 'No' also in the sense that all faster algorithms, even in their simplest implementation, are too complex to be trusted without verification, which is the point where the cat bites its tail. – DarthGizka Nov 08 '14 at 10:31
To demonstrate this point I posted [an apparently simple & solid sieve](http://codereview.stackexchange.com/questions/69240/stb-spot-the-bug-simple-segmented-sieve-of-eratosthenes-for-sieving-up-to-26) on Code Review, a sieve that has been verified to work correctly up to > 10^19 (which is > 2^63) but close to 2^64 it exhibits serious bugs. Like missing some primes, for example. – DarthGizka Nov 08 '14 at 10:36
I agree that existing, trusted programs like the primesieve.org thing are probably the best way to go. The problem lies in finding one that goes beyond the cap of primesieve.org, up to 2^64-1. – DarthGizka Nov 08 '14 at 10:42
@DarthGizka if you take the simplest sieve and just change every int variable there to a bigint, wouldn't you be satisfied of its correctness? (re: your linked code, any code that's using anything but `+,-,*,/` is *not* simple *a priory*!) – Will Ness Nov 08 '14 at 12:11
@user448810: the [gmp](https://gmplib.org/) library contains a function `mpz_probab_prime_p()` that is similar to the algorithm you mentioned (it does a mix of trial division and Miller-Rabin), which makes your solution immediately practical, by relying on the trustworthy, banged-on-a-lot code of the gmp. – DarthGizka Nov 08 '14 at 13:05
gmp isn't so hot for this after all; it contains only some ingredients (including Lucas AFAICS) but those ingredients have to be combined into a working deterministic test using something like Thomas Nicely's code for Baillie-PSW using gmp. gp/PARI can be used as is, without further programming. Either by piping output through the sieve program (or into a file), or by using its forprime_t interface for iterating over arbitrary prime ranges. Pity that it cannot be built for 64-bit mode on an awkward LLP64 system like Windows. `forprime_t` can work with unsigned longs as well, not only big ints. – DarthGizka Nov 11 '14 at 13:13

DanaJ · Answer 2 · 2014-11-25T01:45:41.853

First, thanks for sharing your program and working on correctness. I think it's important to do testing, and sieving near the size boundary was something I spent time working on for my code.

"The same result everywhere: nada, zilch, niente." You're not looking hard enough. There are plenty of tools that do this. It's too bad primesieve doesn't go all the way to 2^64-1, but that doesn't mean nothing else does.

"So how then can one verify the proper operation of a sieve close to 2^64?" One thing I did it is make an edge-case test that runs through all combinations of start/end points near 2^64-1, verifying a number of methods all generate the same results. This relies on having a list of these primes to start, but there are many ways to get these. Not only does this test the sieve at this range, but tests the start/end conditions to make sure there are no issues there.

Some ways to generate a million primes below 2^64:

time perl -Mntheory=:all -E 'forprimes { say } ~0-44347170,~0' | md5sum

Takes ~2s to generate 1M primes. We can force use of different code (Perl or GMP), use primality tests, etc. Lots of ways to do this, including just looping and calling is_provable_prime($n), for example. There are also other Perl modules including Math::Primality though they are much slower.

echo 'forprime(i=2^64-44347170,2^64-1,print(i))' | time gp -f -q | md5sum

Takes ~13s to generate 1M primes. As with the Perl module, there are lots of alternate ways including looping calling isprime which is a deterministic routine (assuming a non-ancient version of Pari/GP).

#include <stdio.h>
#include <gmp.h>
int main(void) {
  mpz_t n;
  mpz_init_set_str(n,"18446744073665204445",10);
  mpz_nextprime(n, n);
  while (mpz_sizeinbase(n,2) < 65) {
    /* If you don't trust mpz_nextprime, one could add this:
     * if (!mpz_probab_prime_p(n, 100))
     *   { fprintf(stderr, "Bad nextprime!\n"); return -1; }
     */
    gmp_printf("%Zd\n",n);
    mpz_nextprime(n, n);
  }
  mpz_clear(n);
  return 0;
}

Takes about 30s and get the same results. This one is more dubious as I don't trust its 25 preset-random base MR test as much as BPSW or one of the proof methods, but it doesn't matter in this case as we see the results match. Adding the extra 100 tests is very expensive in time, but would make it extremely unlikely to have false results (I suspect we have overlapping bases so this is also wasteful).

from sympy import nextprime
n = 2**64-44347170;
n = nextprime(n)
while n < 2**64:
  print n
  n = nextprime(n)

Using Python's SymPy. Unfortunately primerange uses crazy memory when given 2^64-1 so that's not possible to use. Doing the simple nextprime method isn't ideal -- it takes about 5 minutes, but generates the same results (the current SymPy isprime uses 46 prime bases, which is many more than needed for deterministic results under 2^64).

There are other tools, e.g. FLINT, GAP, etc.

I realize that since you're on Windows, the world is wonky and lots of things don't work right. I have tested Perl's ntheory on Windows and with both Cygwin and Strawberry Perl from command prompt I get the same results. The GMP code ought to work the same, assuming GMP works correctly.

Edit add: If your results don't match one of the comparison methods, then one of the two (or both) is wrong. It may be the comparison code that is wrong! It helps everyone if you find and report errors. It's unlikely but possible they are both wrong in the same way, which is why I like to compare with as many other sources as possible. To me that is more robust than picking one "golden" code to compare against. Especially if you're using an oddball platform that may not have been thoroughly tested.

For BPSW, there are a few implementations around:

Pari. AES Lucas, in the Pari source code so not sure how portable it is.
TR Nicely. Strong Lucas, standalone code.
David Cleaver. Standard, Strong or Extra Strong Lucas. Standalone library, very clear, very easy to use.
My non-GMP code, including asm Montgomery math for x86_64. Quite a bit faster than bigint codes of course.
My GMP code. Standard, Strong, AES, or Extra strong Lucas. Faster than the other bigint codes. Also has other Frobenius and other compositeness tests. Can be made standalone.
I have a version using LibTomMath that I hope to get into one of the Perl6 VMs. Only interesting if you want to use LTM.

All verified vs. the Feitsma data. I'm sure there are more implementations around as well. FLINT has a variation that is quite fast, but it isn't really BPSW (but it's been verified for numbers under 2^64).

I would be wary of using mpz_nextprime() if using MPIR 2.6.x instead of GMP. The MPIR developers changed mpz_nextprime() to only try 2 (!) bases. They assume you will use mpz_isprime() on the result of mpz_nextprime(). — casevh, Nov 24 '14 at 23:09
mpz_nextprime in MPIR still does 25 bases, though is marked obsolete and "will disappear in future releases." (as is the GMP-API mpz_probab_prime_p call). The new mpz_next_prime_candidate call only does 2. I don't see any mention of mpz_isprime() in either documentation or source tree. — DanaJ, Nov 25 '14 at 01:17
The behavior of mpz_nextprime() was changed. In the copy of 2.6.0 I have handy, mpz_nextprime() just calls mpz_next_likely_prime() which only does 2 MR rounds. It was fixed in 2.7.0. mpz_nextprime() was changed to call mpz_next_likely_prime() and then call miller-rabin 23 additional times. The reference to mpz_isprime() was an error; it should be mpz_probable_prime_p(). — casevh, Nov 25 '14 at 02:58
Thanks @ DanaJ for a very informative article, and also for uncovering prime-related bugs in FLINT and other important systems; thanks @ casevh as well. I'd like to remind other readers that functions like isprime(), nextprime() and so on have suffered from buggy implementations in many major systems - Java/js runtimes, MPIR, FLINT, gp/PARI and so on; even now the test suites contain only cursory checks for the primality test code. Which means that a priori trust in them for purposes of verifying other software is fairly low, even if they ship as precompiled binaries (which only a few do). — DarthGizka, Nov 26 '14 at 15:06
Also, there's a difference between bug hunting and verification. For hunting bugs pretty much anything is fair game; flexibility and superior throughput trump many other considerations. Verification is different. Trust cannot be generated out of nothing, only transferred, and it attaches only to the verified binary itself since compilers and build systems can break even correct source and statically compiled libs. Same for Miller-Rabin, BPSW and so on; trust applies only to sets of bases verified up to their given limits. 46 random bases are overkill but still probabilistic, not deterministic. — DarthGizka, Nov 26 '14 at 15:29
SymPy uses the first 46 prime bases (and Perl6 currently uses the first 100), so it is deterministic and correct for values under 2^64. It is deterministic but unknown correctness for larger values (for arbitrary inputs we can assume an error bound, but for adversarial situations it is either correct or not). There are known counterexamples to the both SymPy and Perl6's implementations. — DanaJ, Nov 26 '14 at 21:29

score 0 · Answer 3 · edited May 23 '17 at 12:02

In general, one must use less naive techniques than trial division, or be very patient. (gp/PARI documentation)

For 64-bit integers, trial division takes millions of times as long as even a simple sieve, let alone thoroughbreds like Kim Walisch's program (primesieve.org) which is orders of magnitude faster.

The reference sieve I want to verify (there's a standalone .cpp @ pastebin) finds about a million primes per second when sieving close to 2^64, whereas the trial division code I lifted out of the gmp implementation takes 20 seconds to find even one. Restricting trial division to presieved primes (stored as deltas with one byte per prime for fast iteration) speeds it up by an order of magnitude, but it still outputs less than one prime per second on my laptop.

Hence, trial division can deliver only homœopathic amounts of reference data, even if I use all cores I can lay hands on including Kindle, phone and toaster.

More sophisticated tests like Miller-Rabin or the Baillie-PSW linked by user448810 are several orders of magnitude faster than trial division. For numbers up to 2^64 the Baillie-PSW has been verified to be deterministic (no strong pseudo primes below that threshold). The Miller-Rabin may or may not be deterministic up to 2^64 if the first 12 primes are used as base, or the 7-base set found by Jim Sinclar (meaning the 'net offers statements to that effect but apparently no evidence).

With Baillie-PSW verified - and faster to boot - it seems like a good choice. Unfortunately it is also several orders of magnitude more complicated than a sieve, making it even more important to find trustworthy implementations that are ready to compile without lots of twiddling or - ideally - available as binaries.

Thomas Nicely's Baillie-PSW page has source code that uses the gmp, and gp/PARI can use either gmp or its own code. The latter is also available as a binary, which is very fortunate since building gmp code on an exotic, off-beat platform like MinGW under Windows is a non-trivial undertaking, even if MPIR is used instead of gmp.

That gets us some bulk data but still nowhere near enough for verifying the sieve, since it is orders of magnitude too slow even for covering the blank area left by the cap of primesieve.org (10 * 2^32 numbers).

This is where Will Ness's bigint idea comes in. The operation of the sieve can be verified up to 1,000,000,000,000 using reference data from multiple, independent sources. Switching index variables from 32-bit to 64-bit eliminates most of the boundary cases that could cause the code to mess up in higher regions, leaving only a very few places where even uint64_t gets close to its limits. With those places thoroughly inspected and generously covered by test cases derived from the Baillie-PSW undertaking we can have reasonably high confidence that the sieve code is good. Add copious verification against primesieve.org in the range from 10^12 up to its cap, and it should be sufficient to regard the sieve implementation as trustworthy.

With the sieve up and running, it's easy to cover arbitray ranges with bulk data. Or with digests, as a canned/compressed means of verification that can serve needs of any size and shape. It's what I use in the demo .cpp I mentioned earlier, although my real code uses a mixture between an optimised digest implementation for general work and a special raw memory checksum of 128 bits for quick self-checks of factor sieve bitmaps.

SUMMARY

up to 1,000,000,000,000 verification against primos.mat.br or similar
up to 2^64 - 10 * 2^32 verification against primesieve.org
rest up to 2^64-1: verification of strategically chosen segments using Baillie-PSW (e.g. gp/PARI)

BPSW has been verified to 2^64, as has the 7-base M-R solution and 3-base hashed M-R solution, using the same methods for verification. For your sieve, you can use Pari/GP or Perl's ntheory to generate your chunk of primes, either with primes(a,b) or forprimes. They produce identical checksums for the 100M range below 2^64, with Pari/GP running at about 2M/s and ntheory at 13M/s. They both also work with ranges that span 2^64. Doing checksums on the range should be quite easy with either Pari/GP and Perl/ntheory. — DanaJ, Nov 15 '14 at 01:46

How can one verify the proper operation of a sieve close to 2^64?

3 Answers3