4

I have a raster file (basically 2D array) with close to a million points. I am trying to extract a circle from the raster (and all the points that lie within the circle). Using ArcGIS is exceedingly slow for this. Can anyone suggest any image processing library that is both easy to learn and powerful and quick enough for something like this?

Thanks!

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
user308827
  • 21,227
  • 87
  • 254
  • 417

3 Answers3

2

Extracting a subset of points efficiently depends on the exact format you are using. Assuming you store your raster as a numpy array of integers, you can extract points like this:

from numpy import *

def points_in_circle(circle, arr):
    "A generator to return all points whose indices are within given circle."
    i0,j0,r = circle
    def intceil(x):
        return int(ceil(x))
    for i in xrange(intceil(i0-r),intceil(i0+r)):
        ri = sqrt(r**2-(i-i0)**2)
        for j in xrange(intceil(j0-ri),intceil(j0+ri)):
            yield arr[i][j]

points_in_circle will create a generator returning all points. Please note that I used yield instead of return. This function does not actually return point values, but describes how to find all of them. It creates a sequential iterator over values of points within circle. See Python documentation for more details how yield works.

I used the fact that for circle we can explicitly loop only over inner points. For more complex shapes you may loop over the points of the extent of a shape, and then check if a point belongs to it. The trick is not to check every point, only a narrow subset of them.

Now an example of how to use points_in_circle:

# raster dimensions, 10 million points
N, M = 3200, 3200
# circle center and its radius in index space
i0, j0, r = 70, 20, 12.3

raster = fromfunction(lambda i,j: 100+10*i+j, (N, M), dtype=int)
print "raster is ready"
print raster

pts_iterator = points_in_circle((i0,j0,r), raster) # very quick, do not extract points yet
pts = array(list(pts_iterator)) # actually extract all points
print pts.size, "points extracted, sum = ", sum(pts)

On a raster of 10 million integers it is pretty quick.

Please describe file format or put a sample somewhere if you need more specific answer.

sastanin
  • 40,473
  • 13
  • 103
  • 130
  • You might want to remove the call to `in_circle` and replace `intcell(j0+r)` with `intcell(j0+ri)`. Also note that you treat the left and bottom of the circle differently from the right and top: the left and bottom limits should be int(floor(…)), for symmetry reasons. – Eric O. Lebigot May 05 '10 at 20:44
  • Yes, thank you. I fixed `+ri` and removed `in_circle`. Left and bottom (top) should be `int(ceil(...))` though, because we need only inner points. For the right and top (bottom) I use the same `int(ceil(...))` because `xrange` is not inclusive with respect to its second argument. So, to make a symmetric selection, I use `ceil` on both ends. – sastanin May 07 '10 at 00:17
  • Thanks! The speed is very impressive! I am not sure why this will not work for extracting overlapping circles as you mentioned in the comment below. – user308827 May 16 '10 at 20:56
  • Thanks again! I was able to combine it with my code which converts a arcGIS raster to a numpy array. What was taking arcGIS 2 minutes is now accomplished in a second :) – user308827 May 25 '10 at 18:38
  • To avoid edge effects (e.g. negative or out of range indices) you can add the following condition `if (i >= 0 and i < rows) and (j>=0 and j < cols):` one the line before `yield arr[i][j]` in `points_in_circle`. – jatobat Feb 13 '14 at 13:06
2

Numpy allows you to do this, and is extremely fast:

import numpy

all_points = numpy.random.random((1000, 1000))  # Input array
# Size of your array of points all_points:
(image_size_x, image_size_y) = all_points.shape
# Disk definition:
(center_x, center_y) = (500, 500)
radius = 10

x_grid, y_grid = numpy.meshgrid(numpy.arange(image_size_x),
                                numpy.arange(image_size_y))
# Array of booleans with the disk shape
disk = ((x_grid-center_x)**2 + (y_grid-center_y)**2) <= radius**2

# You can now do all sorts of things with the mask "disk":

# For instance, the following array has only 317 points (about pi*radius**2):
points_in_disk = all_points[disk]
# You can also use masked arrays:
points_in_circle2 = numpy.ma.masked_array(all_points, ~disk)
from matplotlib import pyplot
pyplot.imshow(points_in_circle2)
Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • 1
    This approach is likely to be much more memory hungry when extracting small shapes from huge rasters. In fact it is almost three times slower than my script on the same input (10M points, small circle) and almost 10 times slower with 100M points (and 2GB RAM). But unlike my solution, it preserves the shape (when using masked arrays), and it may be better when extracting the same shape twice. (BTW, You forgot to define `all_points`). – sastanin May 05 '10 at 17:16
  • @jetxee: Yes, this approach consumes memory, and is slower for small radii. However, the code is very legible, and it executes almost instantly, on a modern machine. – Eric O. Lebigot May 05 '10 at 20:42
1

You need a library that can read your raster. I am not sure how to do that in Python but you could look at geotools (especially with some of the new raster library integration) if you want to program in Java. If you are good with C I would reccomend using something like GDAL.

If you want to look at a desktop tool you could look at extending QGIS with python to do the operation above.

If I remember correctly, the Raster extension to PostGIS may support clipping rasters based upon vectors. This means you would need to create your circles to features in the DB and then import your raster but then you might be able to use SQL to extract your values.

If you are really just a text file with numbers in a grid then I would go with the suggestions above.

TheSteve0
  • 3,530
  • 1
  • 19
  • 25