Faster method of scaling down image array in Python using numpy and pyfits

Question

I'm using Python 2.7.3 with numpy and pyfits to process scientific FITS files. I would like to work on the images at half or one quarter resolution for the sake of speed, and have this code:

# Read red image
hdulist = pyfits.open(red_fn)
img_data = hdulist[0].data
hdulist.close()
img_data_r = numpy.array(img_data, dtype=float)
# Scale it down to one quarter size
my=[]
for line in img_data_r[::4]:
    myline=[]
    for item in line[::4]:
        myline.append(item)
    my.append(myline)
img_data_r = my

This works, but I wonder if there is a faster, more native way to reduce the array. The reductions should happen as early as possible, the idea being that the data that will be processed is of minimal acceptable size. If there was a way of reading a reduced dataset with pyfits, that would be ideal. But such a method doesn't seem to exist (correct me if I'm wrong). How about numpy? Or scipy/math/anything else?

I think scikit-image has a `resize` method (not sure if it's faster though): http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize — Matt, Feb 05 '14 at 12:54
I think reading the original (whole) data is the bottleneck and unless you only read the thumbnail of the image (in some JPGs these exist in an EXIF tag), all you do will not fasten this up dramatically. — Alfe, Feb 05 '14 at 12:59
These kind of questions are probably more adequate for [CodeReview](http://codereview.stackexchange.com/) — joaquin, Feb 05 '14 at 13:10
You're probably right about the bottleneck. Still, even though I haven't timed it, Sven's solution should be faster than my nested for loops. At least it's shorter and more elegant! :) EDIT: @joaquin - Sorry if this the wrong place, I'm a noob! — Karl D, Feb 05 '14 at 13:15
how big are your images? maybe a small change in the code of what you do with them can change more than only using part of your data (or is it just for prototyping that you want to do this?) — usethedeathstar, Feb 05 '14 at 13:22
@usethedeathstar The images are large, 4000x4000 pixels, but for initial processing I don't need to see them larger than 1000x1000. My processing code could sure be optimised too though. — Karl D, Feb 05 '14 at 13:27
@KarlD 4000x4000 is not big, 16 megapixels, so even if each pixel is a complex128 (worst case, probably you use float64 though), thats 256Mb in memory, which is still small (sidequestion: how much ram you got, and did you use some benchmarking stuff to check your bottlenecks?) In general, when doing image processing: avoid python for-loops, nearly everything can be pushed down to numpy (read: C), which is way faster — usethedeathstar, Feb 05 '14 at 13:31
Also FWIW this is a subtly of `pyfits`, but by default a data array read through `pyfits` is mmap'd, so the entire array is not read into memory at once (though depending on the native data type when you call `np.array(img_data, dtype=float)` that will *probably* copy it, but this generally shouldn't be necessary--it's already a Numpy array. — Iguananaut, Feb 06 '14 at 15:55

Sven Marnach · Accepted Answer · 2014-02-05T13:03:02.030

3

The data array you get from pyfits already is a NumPy array. You don't need to create one from it. Moerover, you can simply do the downsampling in a single step:

img_data_r = hdulist[0].data[::4, ::4]

This won't copy the data, but rather simply copy a new view with different strides. If you need the down-sampled image as a contiguous array, use numpy.ascontiguousarray().

This method of downsampling only keeps one in sixteen pixels, and completely drops the information in all the other pixels. If you need higher-quality downsampling, rather than doing it in your code, you are probably better off to downsample your FITS files using Imagemagick. This will also reduce the time it takes to read the files from disk.

To convert all your FITS files in the current directory in place (warning: big versions get overwritten), you could use

mogrify -resize 25% *.fits

edited Feb 05 '14 at 13:03

answered Feb 05 '14 at 12:56

Sven Marnach

574,206
118
941
841

Sven, that works, thank you. I knew I was missing something, and you pointed out the exact thing, stepping through a two dimensional list. I've seen this before and didn't understand it... As for the ImageMagick method, I didn't even know IM could read FITS files. Will experiment, but probably keep the native Python code. – Karl D Feb 05 '14 at 13:12
Or, use Python imaging library to do the resizing. – pv. Feb 05 '14 at 13:33
In any case, never ever use Python for loops over Numpy arrays; if there isn't already a way to do it in Numpy, scipy, or related packages then write a routine in Cython or C. But in any case for resampling there are already tons of options. – Iguananaut Feb 06 '14 at 15:53

Faster method of scaling down image array in Python using numpy and pyfits

1 Answers1