TL:DR; With your current code, if we run into an error on the xth model, the first x - 1 results will not be lost; they will still be in the text file, which gets saved by python automatically. However, we can also use try/except
blocks to prevent python from crashing when one of the models causes an error, so we can attempt to get a result for all of the models. See code in the Putting It All Together section.
Avoid Losing Your Progress
As others have pointed out, it's impossible to tell you how to fix your error without knowing what the error is, or seeing a stack trace. However, we can prevent python from crashing by adding some error handling with a try/except
block:
for p in Parameter1:
try:
# perform model calculations
fileObject = open(filename, 'ab')
np.savetxt(fileObject, rowOfData, delimiter = ',', newline = ' ')
fileObject.write('\n')
fileObject.close()
except Exception as err:
print(f'unable to process {p}: {err}')
This won't prevent any errors, but if an error does occur, rather than crashing, python will print out a message containing information about which model caused an error, and will continue processing the remaining models.
Memory Usage
I also probably have to be conscious of memory usage - I've figured out how to do what I propose with a text file, but it crashes around 2600 lines because I don't think it likes opening a text file that long.
While it is true that memory usage might be a concern with datasets this large, this is not because you are opening a large text file. When python opens a file, it does not immediately load all of the file's data into some variable. In fact, it doesn't load any of the file's data. Rather, it keeps track of where you are in the file, and only loads the contents of the file when you call file.read()
(or some other function that reads from the file). In fact, since you opened the file in append mode, your script is unable to read from the file at all.
You can test that file size isn't an issue by making some arbitrarily long file and attempting to write to it. I tested this by running your script with a dummy array, and programming it to write 1,000,000 lines:
import numpy as np
array = np.array([1] * 100)
num_lines = 1_000_000
filename = 'myFile.txt'
for i in range(num_lines):
fileObject = open(filename, 'a')
np.savetxt(fileObject, array, delimiter = ',', newline = ' ')
fileObject.write('\n')
fileObject.close()
(If you think your issue is opening too large of a file, feel free to run this script yourself and verify the output with wc -l myFile.txt
, but be warned that the resulting file is 2.3 GB!)
However, memory usage may be a concern in a different part of this script. In your pseudocode, you are storing the metrics for each model in the dataArray
. If your real code is storing a very large number of metrics per model, this might become an issue. Only store these metrics if you need them later (e.g., if future models depend on the metrics of previous models).
Other Improvements
fileObject = open(filename, 'ab')
np.savetxt(fileObject, rowOfData, delimiter = ',', newline = ' ')
fileObject.write('\n')
fileObject.close()
As @Reedinationer pointed out, it's better to use a with
statement here, to avoid the overhead of having to open and close the file for each row (and because it's best-practice). You don't need to be concerned that you might "lose your progress," as python will automatically flush your data to the file and close the file when the process ends, regardless of whether you use the with
statement. You can test this by opening a file, writing to it, and raising an exception before closing it:
import numpy as np
array = np.array([1] * 100)
num_lines = 1_000_000
filename = 'myFile.txt'
fileObject = open(filename, 'a')
for i in range(num_lines):
if i == 900_000:
raise Exception
np.savetxt(fileObject, array, delimiter = ',', newline = ' ')
fileObject.write('\n')
The resulting file will have 900,000 lines, even though the process exited before closing the file. We also don't need to open the file in binary mode, since np.savetxt
writes plain text to the file. With these changes, our loop looks like this:
with open(filename, 'a') as fileObject:
for p in Parameter1:
# perform model calculations
np.savetxt(fileObject, rowOfData, delimiter = ',', newline = ' ')
fileObject.write('\n')
Additionally, rather than calling np.savetxt
with newline = ' '
, and then manually writing a newline to the file, we can just allow np.savetxt
to use the default \n
newline character:
np.savetxt(fileObject, rowOfData, delimiter = ',')
With this modification, our loop looks like this:
with open(filename, 'a') as fileObject:
for p in Parameter1:
# perform model calculations
np.savetxt(fileObject, rowOfData, delimiter = ',')
Putting It All Together
Here's what the code looks like with all of our improvements:
with open(filename, 'a') as fileObject:
for p in Parameter1:
try:
# perform model calculations
np.savetxt(fileObject, rowOfData, delimiter = ',')
except Exception as err:
print(f'unable to process {p}: {err}')