Variable area threshold for identifying objects - python

Question

I have an array which contains information of the size and location a series of shapes: where the array is zero, there are no shapes, where the array is not zero there is a shape. Different shapes are separated by zeros - so that if you were to plot every point in the array, you would see a map of the various shapes. I hope that makes sense, if not here is an example array containing 4 different shapes:

np.array([[0, 0, 0, 1, 0, 0, 0],
          [0, 0, 0, 0, 0, 0, 0],
          [1, 1, 0, 0, 1, 0, 0],
          [1, 1, 0, 0, 0, 1, 1],
          [0, 0, 0, 0, 0, 1, 1],
          [3, 5, 2, 0, 0, 0, 0]])

I need to count and identify these shapes but I only want to include the ones with an area above a certain threshold. I would like the area threshold to be 1/15 of the area of the largest shape in the field. (In the above example, the largest area would be 5.

The question is: How can I find (using python) the area of the maximum shape in the field without individually identifying each shape?

Edit

To clarify what I mean by the 'shapes', the following code plots an image of the array, which shows 4 distinct objects:

import numpy as np
import matplotlib.pyplot as plt

a = np.array([[0, 0, 0, 1, 0, 0, 0],
              [0, 0, 0, 0, 0, 0, 0],
              [1, 1, 0, 0, 1, 0, 0],
              [1, 1, 0, 0, 0, 1, 1],
              [0, 0, 0, 0, 0, 1, 1],
              [1, 1, 1, 0, 0, 0, 0]])
ind = np.nonzero(arr)
x = ind[0]
y = ind[1]
plt.imshow(arr)
plt.show()

Can you clarify where/how you get 4 shapes from the example data? — Tom Dalton, Aug 08 '14 at 15:32

ali_m · Accepted Answer · 2014-08-11T08:39:26.703

You can use scipy.ndimage.label to find the connected non-zero regions in your array, then use scipy.ndimage.sum to find the area of each region:

from scipy import ndimage

labels, nshapes = ndimage.label(a)
areas = ndimage.sum(a, labels=labels, index=range(1, nshapes))

idx = np.argmax(areas)
biggest_shape = labels == (idx + 1)

In your example there happen to be two 'shapes' with the same area:

from matplotlib import pyplot as plt

fig, (ax1, ax2, ax3) = plt.subplots(1, 3)

ax1.imshow(a, cmap=plt.cm.jet)
ax2.imshow(labels, cmap=plt.cm.jet)
ax3.imshow(biggest_shape, cmap=plt.cm.jet)

enter image description here

Update

The structure argument passed to scipy.ndimage.label determines which neighbouring elements are considered to be 'connected' (see the docs linked above). If you want diagonally adjacent elements to be considered as connected, you can pass a 3x3 array of ones:

labels, nshapes = ndimage.label(a, structure=np.ones((3, 3)))

enter image description here

Thank you, this is really useful. Is there a way I could adapt it to include the diagonals? I want the biggest shape to be the 'orange and yellow' shape combined with area 5. — heliqua, Aug 11 '14 at 07:45

Variable area threshold for identifying objects - python

Edit

1 Answers1

Update

Linked