I am currently writing a python script to generate every prime by brute force. I currently have a >5Mb file containing prime numbers and as the script runs it appends any new prime it finds so the file will keep getting bigger. Every time the script is run this file is read into a list which then gets looped over to calculate if the next number is a prime or not any new prime also gets appended to this list.
My question is, is it better to load this file into memory every time the script is run, or should I read the next line of the file in a for loop, process that against the number being checked, then load the next line?
The former creates a large list being held in memory but is very fast, the second would be slower because it has to read the file every time the loop iterates but I don't think it would use near the memory.
here is my code it takes a configuration file as an argument containing the number to start looking for primes at and the file to read/write primes to:
import sys, math, time
def is_prime(num,primes):
square = math.floor(math.sqrt(num))
print('using all prime numbers up to %d' % square)
for p in primes:
if p <= square:
print (p, end='\r')
if (num % p) == 0:
return False
else:
return True
return True
def main(argv):
if len(sys.argv) == 2:
try:
try:
f = open(sys.argv[1], 'r+')
except IOError:
sys.exit('Error: File %s does not exist in the current directory...\nUsage: generate_primes.py <prime_file>' % sys.argv[1])
f.close()
f = open(sys.argv[1], 'r+')
low = f.readlines()
f.close()
num_to_check = int(low[0].strip('\n'))
file_name = low[1].strip('\n')
print(num_to_check)
print(file_name)
if num_to_check % 2 == 0:
num_to_check += 1
f = open(file_name, 'a+')
f.seek(0)
primes = f.readlines()
print('Processing Primes...')
for key,i in enumerate(primes):
primes[key] = int(primes[key].strip('\n'))
if primes[-1] > num_to_check:
num_to_check = primes[-1]
print('Last saved prime is bigger than config value.\nDefaulting to largest saved prime... %d' % primes[-1])
time.sleep(2)
new_primes = 0
while True:
print('Checking: %s ' % str(num_to_check), end='')
if is_prime(num_to_check,primes):
print('Prime')
f.write('%s\n' % str(num_to_check))
primes.append(num_to_check)
new_primes += 1
else:
print('Composite')
num_to_check += 2
except KeyboardInterrupt:
config_name = time.strftime('%Y%m%d-%H%M%S')
print('Keyboard Interrupt: \n creating config file %s ... ' % config_name)
c = open(config_name,'w')
c.write('%d\n%s' % (num_to_check,file_name))
c.close()
f.close()
print('Done\nPrimes Found: %d\nExiting...' % new_primes)
sys.exit()
if __name__ == '__main__':
main(sys.argv[1:])
Note: the primes file cannot contain a solitary 1 otherwise every number will come up composite.
The one concern I have about only reading from the file is being able to get the value of the largest prime stored (aka. reading the last line in the file).