In case you're just looking for a solution skip ahead to the end of this answer but in case you're interested why that (probably) don't skip ahead
it takes extremely long (more than a couple of hours).
That seems like your running out of RAM and your computer then starts using swap memory. That means it will save part (or all) of the objects on your hard disk and remove them from RAM so it can load them again when needed. In some cases swap memory can be very efficient because it only needs to reload from the hard disk rarely but in some cases it has to reload lots of times and then your going to notice a "whole system slow down" and "never ending operations".
After some investigations I think the problem is mainly because the numpy.array
created by ccdproc.combine
is stacked along the first axis and the operation is along the first axis. The first axis would be good in case it's a FORTRAN-contiguous array but ccdproc
doesn't specify any "order" and then it's going to be C-contiguous. That means the elements on the last axis are stored next to each other in memory (if it were FORTRAN-contiguous the elements in the first axis would be next to each other). So if you run out of RAM and your computer starts using swap memory it puts parts of the array on the disk but because the operation is performed along the first axis - the memory addresses of the elements that are used in each operation are "far away from each other". That means it cannot utilize the swap memory in a useful way because it has to basically reload parts of the array from the hard disk for "each" next item.
It's not very important to know that actually, I just included that in case you're interested what the reason for the observed behavior war. The main point to take away is that if you notice that the system is becoming very slow if you run any program and it doesn't seem to make much progress that's because you have been running out of RAM!
The easiest solution (although it has nothing to do with programming) is to buy more RAM.
The complicated solution would be to reduce the memory footprint of your program.
Let's first do a small calculation how much memory we're dealing with:
Your images are 3128 * 3080
that's 9634240
elements. They might be any type when you read them but when you use ccdproc.subtract_overscan
they will be float
s later. One float (well, actually np.float64
) uses 8 bytes, so we're dealing with 77073920
bytes. That's roughly 73 MB per bias image. You have 30 bias images, so we're dealing with roughly 2.2 GB of data here. That's assuming your images don't have uncertainty
or mask
. If they have that would add another 2.2 GB for the uncertainties or 0.26 GB for the masks.
2.2 GB sounds like a small enough number but ccdproc.combine
stacks the NumPy arrays. That means it will create a new array and copy the data of your ccd
s into the new array. That will double the memory right there. It makes sense to stack them because even though it will take more memory it will be much faster when it actually does the "combining" but it's not there yet.
All in all 4.4 GB could already exhaust your RAM. Some computers will only have 4GB RAM and don't forget that your OS and the other programs need some RAM as well. However, it's unlikely that you run out of RAM if you have 8GB or more but given the numbers and your observations I assume that you only have 4-6GB of RAM.
The interesting question is actually how to avoid the problem. That really depends on the amount of memory you have:
Less than 4GB RAM
That's tricky because you won't have much free RAM after you deduct the size of all the CCDData
objects and what you OS and the other processes need. In that case it would be best to process for example 5 bias images at a time and then combine the results of the first combinations. That's probably going to work because you use average
as method (it wouldn't work if you used median
) because (A+B+C+D) / 4
is equal to ((A+B)/2 + (C+D)/2)/2
.
That would be (I haven't actually checked this code, so please inspect it carefully before you run it):
biasImages = []
biasImagesCombined = []
for idx, filename in enumerate(allfiles.files_filtered(NAXIS1=3128,NAXIS2=3080,OBSTYPE = 'BIAS')):
ccd = fits.getdata(allfiles.location + filename)
ccd = cp.CCDData(ccd, unit = u.adu)
ccd = cp.subtract_overscan(ccd,overscan_axis = 1, fits_section = '[3099:3124,:]')
ccd = cp.trim_image(ccd,fits_section = '[27:3095,3:3078]')
biasImages.append(ccd)
# Combine every 5 bias images. This only works correctly if the amount of
# images is a multiple of 5 and you use average as combine-method.
if (idx + 1) % 5 == 0:
tmp_bias = cp.combine(biasImages, method='average')
biasImages = []
biasImagesCombined.append(tmp_bias)
master_bias = cp.combine(biasImagesCombined, output_file = path + 'mbias_avg.fits', method='average')
4GB of RAM
In that case you probably have 500 MB to spare, so you could simply use mem_limit
to limit the amount of RAM the combine
will take additionally. In that case just change your last line to:
# To account for additional memory usage I chose 100 MB of additional memory
# you could try to adapt the actual number.
master_bias = cp.combine(biasImages, mem_limit=1024*1024*100, ioutput_file=path + 'mbias_avg.fits', method='average')
More than 4GB of RAM but less than 8GB
In that case you probably have 1 GB of free RAM that could be used. It's still the same approach as the 4GB option but you could use a much higher mem_limit
. I would start with 500 MB: mem_limit=1024*1024*500
.
More than 8GB of RAM
In that case I must have missed something because using ~4.5GB of RAM shouldn't actually exhaust your RAM.