3

I seem to be having an issue with some basic astronomical image processing/calibration using the python package ccdproc.

I'm currently compiling 30 bias frames into a single image average of the component frames. Before going through the combination I iterate over each image in order to subtract the overscan region using subtract_overscan() and then select the image dimensions I want to retain using trim_image().

I suppose my indexing is correct but when I get to the combination, it takes extremely long (more than a couple of hours). I'm not sure that this is normal. I suspect something might be being misinterpreted by my computer. I've created the averaged image before without any of the other processing and it didn't take long (5-10 mins or so) which is why I'm thinking it might be an issue with my indexing.

If anyone can verify that my code is correct and/or comment on what might be the issue it'd be a lot of help.

Image dimensions: NAXIS1 = 3128 , NAXIS2 = 3080 and allfiles is a ccdproc.ImageFileCollection.

from astropy.io import fits
import ccdproc as cp

biasImages = []
for filename in allfiles.files_filtered(NAXIS1=3128,NAXIS2=3080,OBSTYPE = 'BIAS'):
    ccd = fits.getdata(allfiles.location + filename)
    # print(ccd)
    ccd = cp.CCDData(ccd, unit = u.adu)
    # print(ccd)
    ccd = cp.subtract_overscan(ccd,overscan_axis = 1, fits_section = '[3099:3124,:]')
    # print(ccd)
    ccd = cp.trim_image(ccd,fits_section = '[27:3095,3:3078]')
    # print(ccd)
    biasImages.append(ccd)

master_bias = cp.combine(biasImages,output_file = path + 'mbias_avg.fits', method='average')
MSeifert
  • 145,886
  • 38
  • 333
  • 352

2 Answers2

2

The code looks similar to my own code for combining biases together (see this example), so there is nothing jumping out immediately as a red flag. I rarely do such a large number of biases and the ccdproc.combine task could be far more optimized, so I'm not surprised it is very slow.

One thing that sometimes I run into is issues with garbage collection. So if you are running this in a notebook or part of a large script, there may be a problem with the memory not being cleared. It is useful to see what is happening in memory, and I sometimes include deleting the biasImages object (or an other list of ccd objects) after it has been used and it isn't needed any further

I'm happy to respond further here, or if you have further issues please open an issue at the github repo.

  • Thanks for the comment. My script is not large... In total, with flat calibrations, it's at 130 lines or so. I haven't run the code for the flats yet though. Also I'm curious about something I saw in your code. you chose 0 for your overscan axis. this is something thats confused me before. Does the package follow numpy or python conventions for image stacks? Ive tried setting it to zero and using the overscan dimensions indicated by the fits key but have gotten errors because of dimensionality. e.g.. I've used osaxis = 0 and typed the 'BIASSEC' value @ [3099:3124,3:3 – Jean-Paul Ventura Aug 13 '17 at 19:18
  • Another possibility is setting the mem_limit to a lower value. If your computer has limited memory, this may be causing it to slow down. As for the overscan, it should correspond to the numpy convention. – Steve Crawford Aug 15 '17 at 18:45
1

In case you're just looking for a solution skip ahead to the end of this answer but in case you're interested why that (probably) don't skip ahead

it takes extremely long (more than a couple of hours).

That seems like your running out of RAM and your computer then starts using swap memory. That means it will save part (or all) of the objects on your hard disk and remove them from RAM so it can load them again when needed. In some cases swap memory can be very efficient because it only needs to reload from the hard disk rarely but in some cases it has to reload lots of times and then your going to notice a "whole system slow down" and "never ending operations".

After some investigations I think the problem is mainly because the numpy.array created by ccdproc.combine is stacked along the first axis and the operation is along the first axis. The first axis would be good in case it's a FORTRAN-contiguous array but ccdproc doesn't specify any "order" and then it's going to be C-contiguous. That means the elements on the last axis are stored next to each other in memory (if it were FORTRAN-contiguous the elements in the first axis would be next to each other). So if you run out of RAM and your computer starts using swap memory it puts parts of the array on the disk but because the operation is performed along the first axis - the memory addresses of the elements that are used in each operation are "far away from each other". That means it cannot utilize the swap memory in a useful way because it has to basically reload parts of the array from the hard disk for "each" next item.

It's not very important to know that actually, I just included that in case you're interested what the reason for the observed behavior war. The main point to take away is that if you notice that the system is becoming very slow if you run any program and it doesn't seem to make much progress that's because you have been running out of RAM!


The easiest solution (although it has nothing to do with programming) is to buy more RAM.

The complicated solution would be to reduce the memory footprint of your program.


Let's first do a small calculation how much memory we're dealing with:

Your images are 3128 * 3080 that's 9634240 elements. They might be any type when you read them but when you use ccdproc.subtract_overscan they will be floats later. One float (well, actually np.float64) uses 8 bytes, so we're dealing with 77073920 bytes. That's roughly 73 MB per bias image. You have 30 bias images, so we're dealing with roughly 2.2 GB of data here. That's assuming your images don't have uncertainty or mask. If they have that would add another 2.2 GB for the uncertainties or 0.26 GB for the masks.

2.2 GB sounds like a small enough number but ccdproc.combine stacks the NumPy arrays. That means it will create a new array and copy the data of your ccds into the new array. That will double the memory right there. It makes sense to stack them because even though it will take more memory it will be much faster when it actually does the "combining" but it's not there yet.

All in all 4.4 GB could already exhaust your RAM. Some computers will only have 4GB RAM and don't forget that your OS and the other programs need some RAM as well. However, it's unlikely that you run out of RAM if you have 8GB or more but given the numbers and your observations I assume that you only have 4-6GB of RAM.


The interesting question is actually how to avoid the problem. That really depends on the amount of memory you have:

Less than 4GB RAM

That's tricky because you won't have much free RAM after you deduct the size of all the CCDData objects and what you OS and the other processes need. In that case it would be best to process for example 5 bias images at a time and then combine the results of the first combinations. That's probably going to work because you use average as method (it wouldn't work if you used median) because (A+B+C+D) / 4 is equal to ((A+B)/2 + (C+D)/2)/2.

That would be (I haven't actually checked this code, so please inspect it carefully before you run it):

biasImages = []
biasImagesCombined = []
for idx, filename in enumerate(allfiles.files_filtered(NAXIS1=3128,NAXIS2=3080,OBSTYPE = 'BIAS')):
    ccd = fits.getdata(allfiles.location + filename)
    ccd = cp.CCDData(ccd, unit = u.adu)
    ccd = cp.subtract_overscan(ccd,overscan_axis = 1, fits_section = '[3099:3124,:]')
    ccd = cp.trim_image(ccd,fits_section = '[27:3095,3:3078]')
    biasImages.append(ccd)
    # Combine every 5 bias images. This only works correctly if the amount of
    # images is a multiple of 5 and you use average as combine-method.
    if (idx + 1) % 5 == 0:
        tmp_bias = cp.combine(biasImages, method='average')
        biasImages = []
        biasImagesCombined.append(tmp_bias)

master_bias = cp.combine(biasImagesCombined, output_file = path + 'mbias_avg.fits', method='average')

4GB of RAM

In that case you probably have 500 MB to spare, so you could simply use mem_limit to limit the amount of RAM the combine will take additionally. In that case just change your last line to:

# To account for additional memory usage I chose 100 MB of additional memory
# you could try to adapt the actual number.
master_bias = cp.combine(biasImages, mem_limit=1024*1024*100, ioutput_file=path + 'mbias_avg.fits', method='average')

More than 4GB of RAM but less than 8GB

In that case you probably have 1 GB of free RAM that could be used. It's still the same approach as the 4GB option but you could use a much higher mem_limit. I would start with 500 MB: mem_limit=1024*1024*500.

More than 8GB of RAM

In that case I must have missed something because using ~4.5GB of RAM shouldn't actually exhaust your RAM.

MSeifert
  • 145,886
  • 38
  • 333
  • 352