15

What is proper algorithm for separating (counting) coffee beans on binary image? Beans can touch and partially overlap.

coffee beans image
(source: beucher at cmm.ensmp.fr)

I am working actually not with coffee beans, but with coffee beans it is easier to describe. This is sub-problem in my task of counting all present people and counting people crossing some imaginary line from supermarket surveillance video. I've extracted moving objects into binary mask, and now I need to separate these somehow.

Two promising algo that someone mentioned in comments:

  • Wathershed+distancetransofrm+labeling. This probably an answer for this question as I put it (beans separation).
  • Tracking of moving objects from video sequence (what is the name of this algorithm?). It can track overlapping objects. This is more promising algo and probably exactly what I need to solve the task that I have (moving people separation).
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Leonid Volnitsky
  • 8,854
  • 5
  • 38
  • 53
  • This is more an image processing question than an actual programming question - maybe try http://dsp.stackexchange.com ? – Paul R Sep 11 '14 at 13:58
  • 2
    The downvotes are in part because you have not tried anything. you are just asking how to do it. – scord Sep 11 '14 at 14:01
  • @scord - I didn't said that didn't tried anything. Where did you get this from? I've tried watershed (not good). – Leonid Volnitsky Sep 11 '14 at 14:05
  • My bad. people will usually downvote if the question doesnt contain some type of code that users can debug/improve. – scord Sep 11 '14 at 14:06
  • @PaulR - algorithms are part of programming. Computer vision is made of algorithm. This question is properly labeled with computer-vision. If this subject is not interesting for you, please ignore such. – Leonid Volnitsky Sep 11 '14 at 14:10
  • 1
    Here's a pretty similar question (that didn't even provide an image) and it was massively upvoted. http://stackoverflow.com/questions/5560507/cell-segmentation-and-fluorescence-counting-in-python I think this is a tough problem, maybe you can do something by assuming convexity and looking for contour continuity. – Bull Sep 11 '14 at 14:12
  • If you want the community to help you you definitedly need to show what have you tried and where do you want to go. There are caes where without that info people got upvotes, but in general you wont. – Ander Biguri Sep 11 '14 at 14:31
  • 2
    it's very different to separate beans or people, so your example isnt that great (I didnt downvote though). You should try to track people then you won't have overlap problems while counting. – Micka Sep 11 '14 at 14:50
  • @LeonidVolnitsky: OK - I was just trying to be helpful - image processing *theory* questions generally do better on DSP.SE because there are some very good image processing experts there and a low signal-to-noise ratio. However if it's just a straightforward programming question, such as how do I implement algorithm X in OpenCV, then it would be on-topic here. – Paul R Sep 11 '14 at 15:00
  • 1
    I believe you are asking a research question and it seems that you haven't done enough research for yourself. If you tried any image analysis book, you would know this is segmentation problem. You should try almost all possible methods in the category. And in my experience with imaging, analogy between coffee bean and human never works. – Tae-Sung Shin Sep 11 '14 at 19:11
  • 3
    Does this help? http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_watershed/py_watershed.html – Pervez Alam Sep 12 '14 at 07:00
  • @LeonidVolnitsky: How precise does this (the count of beans) need to be? – carlosdc Sep 13 '14 at 04:41
  • 1
    For coffe beans, check out this excelent answer in a related question: http://stackoverflow.com/a/14617359/145999 . I doubt however that this will work for your actual problem with people, that sounds more complicated. – HugoRune Sep 13 '14 at 08:17
  • 1
    @carlosdc - As precise a possible. Nobody will get 100%. If small man will stand behind big man - he will be invisible. Client asked for 95%, but I will talk with him about this number. – Leonid Volnitsky Sep 13 '14 at 09:01
  • 1
    For the particular example of coffee beans, watershed + distance transform is a typical solution. Check my [medial feature detector](http://image.ntua.gr/iva/tools/mfd/) which is made for more difficult problems, but should work here. Tracking multiple moving people is an open research problem, check e.g. [here](http://iris.usc.edu/outlines/papers/2010/kuo-huang-nevatia-cvpr10.pdf) for some ideas. There are many ideas and solutions like this but no simple answer. You would need to rewrite your question to get more useful feedback on the actual problem here. – iavr Sep 14 '14 at 12:07
  • 1
    By the way, you may be interested in the [computer vision](http://area51.stackexchange.com/proposals/66531/computer-vision) SE proposal. – iavr Sep 14 '14 at 12:09
  • 1
    Also, since this is no trivial problem, you *should* be able to understand the PDFs that you mention (like the one I pointed above). Or at least someone in your team should, especially if this is a professional task. – iavr Sep 14 '14 at 12:18
  • @LeonidVolnitsky I ported the code in my answer to C++/OpenCV but I didn't update the answer yet. Do you rather Python or C++? – karlphillip Sep 16 '14 at 04:30
  • @LeonidVolnitsky, if this is not your field, would you prefer a solution that uses a specific piece of software, like ImageJ (Fiji)? – CRGreen Sep 20 '14 at 06:33

5 Answers5

16

This approach is a spin-off from mmgp's answer that explains in detail how the watershed algorithm works. Therefore, if you need some explanation on what the code does, please check his answer.

The code can be played with in order to improve the rate of detection. Here it is:

import sys
import cv2
import numpy
from scipy.ndimage import label

def segment_on_dt(a, img):
    border = cv2.dilate(img, None, iterations=3)
    border = border - cv2.erode(border, None)
    cv2.imwrite("border.png", border)

    dt = cv2.distanceTransform(img, 2, 5)    
    dt = ((dt - dt.min()) / (dt.max() - dt.min()) * 255).astype(numpy.uint8)
    _, dt = cv2.threshold(dt, 135, 255, cv2.THRESH_BINARY)
    cv2.imwrite("dt_thres.png", dt)    

border (left), dt (right):

enter image description here enter image description here

    lbl, ncc = label(dt)
    lbl = lbl * (255/ncc)      
    # Completing the markers now. 
    lbl[border == 255] = 255

    lbl = lbl.astype(numpy.int32)
    cv2.imwrite("label.png", lbl)

lbl:

enter image description here

    cv2.watershed(a, lbl)

    lbl[lbl == -1] = 0
    lbl = lbl.astype(numpy.uint8)
    return 255 - lbl

# Application entry point
img = cv2.imread("beans.png")
if img == None:
    print("!!! Failed to open input image")
    sys.exit(0)

# Pre-processing.
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)    
_, img_bin = cv2.threshold(img_gray, 128, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
cv2.imwrite("img_bin.png", img_bin)

img_bin = cv2.morphologyEx(img_bin, cv2.MORPH_OPEN, numpy.ones((3, 3), dtype=int))
cv2.imwrite("img_bin_morphoEx.png", img_bin)

img_bin (left) before and after (right) a morphology operation:

enter image description here enter image description here

result = segment_on_dt(img, img_bin)
cv2.imwrite("result.png", result)

result[result != 255] = 0
result = cv2.dilate(result, None)
img[result == 255] = (0, 0, 255)
cv2.imwrite("output.png", img)

result (left) of watershed segmentation, followed by output (right):

enter image description here enter image description here

Community
  • 1
  • 1
karlphillip
  • 92,053
  • 36
  • 243
  • 426
13

Below is presented an approach to find the center of each bean. Analyzing the central position of segmented objects in frames in different, but sequential, time it is possible to track them. Keeping visual profiles or analyzing its path can increase the accuracy of the tracking algorithm in situations that an object cross the other or there are some overlap.

I used Marvin Image Processing Framework and Java.

Finding the center approach

I used three basic algorithms: threshold, morphological erosion and floodfill segmentation. The first step is the threshold for removing the background, as shown below.

Thresholding

The next step is the application of morphological erosion in order to separate the beans. In the case of a small kernel matrix I can separate the small beans but keep the bigger ones together, as shown below. Filtering using the mass (number of pixels) of each independent segment it is possible to select just the smaller ones, as shown below.

Erosion small kernel

Using a big kernel matrix I can separate the bigger ones and the small ones disappear, as shown below.

enter image description here

Combining the two results - removing center points that are too near and probably from the same bean - I got the result presented below.

enter image description here

Even not having the real segment of each bean, using the center positions it is possible to count and track them. The centers can also be used to find out each bean segment.

Source code

The source code is in Java, but the image processing algorithms employed in the solution are provided by the most frameworks.


EDIT: I edited the source code in order to save the images of each step. The source code can be optimized removing these debug steps and creating methods to reuse code. Some objets and lists were created just to demonstrate theses steps and can be removed too.
import static marvin.MarvinPluginCollection.floodfillSegmentation;
import static marvin.MarvinPluginCollection.thresholding;
import marvin.image.MarvinColorModelConverter;
import marvin.image.MarvinImage;
import marvin.image.MarvinSegment;
import marvin.io.MarvinImageIO;
import marvin.math.MarvinMath;
import marvin.plugin.MarvinImagePlugin;
import marvin.util.MarvinPluginLoader;

public class CoffeeBeansSeparation {

    private MarvinImagePlugin erosion = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.erosion.jar");

    public CoffeeBeansSeparation(){

        // 1. Load Image 
        MarvinImage image = MarvinImageIO.loadImage("./res/coffee.png");
        MarvinImage result = image.clone();

        // 2. Threshold
        thresholding(image, 30);

        MarvinImageIO.saveImage(image, "./res/coffee_threshold.png");

        // 3. Segment using erosion and floodfill (kernel size == 8)
        List<MarvinSegment> listSegments = new ArrayList<MarvinSegment>();
        List<MarvinSegment> listSegmentsTmp = new ArrayList<MarvinSegment>();
        MarvinImage binImage = MarvinColorModelConverter.rgbToBinary(image, 127);

        erosion.setAttribute("matrix", MarvinMath.getTrueMatrix(8, 8));
        erosion.process(binImage.clone(), binImage);

        MarvinImageIO.saveImage(binImage, "./res/coffee_bin_8.png");
        MarvinImage binImageRGB = MarvinColorModelConverter.binaryToRgb(binImage);
        MarvinSegment[] segments =  floodfillSegmentation(binImageRGB);

        // 4. Just consider the smaller segments
        for(MarvinSegment s:segments){
            if(s.mass < 300){   
                listSegments.add(s);
            }
        }

        showSegments(listSegments, binImageRGB);
        MarvinImageIO.saveImage(binImageRGB, "./res/coffee_center_8.png");

        // 5. Segment using erosion and floodfill (kernel size == 18)
        listSegments = new ArrayList<MarvinSegment>();
        binImage = MarvinColorModelConverter.rgbToBinary(image, 127);

        erosion.setAttribute("matrix", MarvinMath.getTrueMatrix(18, 18));
        erosion.process(binImage.clone(), binImage);

        MarvinImageIO.saveImage(binImage, "./res/coffee_bin_8.png");
        binImageRGB = MarvinColorModelConverter.binaryToRgb(binImage);
        segments =  floodfillSegmentation(binImageRGB);

        for(MarvinSegment s:segments){
            listSegments.add(s);
            listSegmentsTmp.add(s);
        }

        showSegments(listSegmentsTmp, binImageRGB);
        MarvinImageIO.saveImage(binImageRGB, "./res/coffee_center_18.png");

        // 6. Remove segments that are too near.
        MarvinSegment.segmentMinDistance(listSegments, 10);

        // 7. Show Result
        showSegments(listSegments, result);
        MarvinImageIO.saveImage(result, "./res/coffee_result.png");
    }

    private void showSegments(List<MarvinSegment> segments, MarvinImage image){
        for(MarvinSegment s:segments){
            image.fillRect((s.x1+s.x2)/2, (s.y1+s.y2)/2, 5, 5, Color.red);
        }
    }

    public static void main(String[] args) {
        new CoffeeBeansSeparation();
    }
}
Gabriel Archanjo
  • 4,547
  • 2
  • 34
  • 41
  • Uau, it seems so cool. Too bad I don't know Marvin. Consider rewriting your answer with JavaCV. – karlphillip Sep 15 '14 at 19:15
  • 2
    Thanks. Marvin started as simple university project in Brazil, but today it has a significant set of features and is developed and supported by people worldwide. Not knowing Marvin is not a big issue. Marvin is developed to provide image processing as easy as possible to Java developers. The source codes are very intuitive, therefore you can learn or develop new solutions using it. In just a few days you'll be using it as any other Java image processing framework. – Gabriel Archanjo Sep 15 '14 at 20:16
  • 1
    +1 Legal. But since it's not a well known framework, sharing the images that are produced in every step of your algorithm would make it so much easier for people to follow visually what's going on. – karlphillip Sep 15 '14 at 20:56
5

There are some elegant answers, but I thought of sharing what I tried because it is bit different to other approaches.

After thresholding and finding the distance transform, I propagate the local maxima of the distance-transformed image. By adjusting the extent of maxima propagation I segment the distance transformed image, then filter these segments by their area, rejecting smaller segments.

This way I can achieve a reasonably good segmentation of the given image, though it does not clearly define the boundaries. For the given image I get a segment count of 42 using the parameter values that I use in the Matlab code to control the extent of maxima propagation and the area threshold.

Results:

enter image description here

enter image description here

Here's the Matlab code:

clear all;
close all;

im = imread('ex2a.gif');
% threshold: coffee beans are black
bw = im2bw(im, graythresh(im));
% distance transform
di = bwdist(bw);
% mask for coffee beans
mask = double(1-bw);

% propagate the local maxima. depending on the extent of propagation, this
% will transform finer distance image to coarser segments 
se = ones(3);   % 8-neighbors
% this controls the extent of propagation. it's some fraction of the max
% distance of the distance transformed image (50% here)
mx = ceil(max(di(:))*.5);
peaks = di;
for r = 1:mx
    peaks = imdilate(peaks, se);
    peaks = peaks.*mask;
end

% how many different segments/levels we have in the final image
lvls = unique(peaks(:));
lvls(1) = []; % remove first, which is 0 that corresponds to background
% impose a min area constraint for segments. we can adjust this threshold
areaTh = pi*mx*mx*.7;
% number of segments after thresholding by area
nseg = 0;

% construct the final segmented image after thresholding segments by area
z = ones(size(bw));
lblid = 10;  % label id of a segment
for r = 1:length(lvls)
    lvl = peaks == lvls(r); % pixels having a certain value(level)
    props = regionprops(lvl, 'Area', 'PixelIdxList'); % get the area and the pixels
    % threshold area
    area = [props.Area];
    abw = area > areaTh;
    % take the count that passes the imposed area threshold
    nseg = nseg + sum(abw);
    % mark the segments that pass the imposed area threshold with a unique
    % id
    for i = 1:length(abw)
        if (1 == abw(i))
            idx = props(i).PixelIdxList;
            z(idx) = lblid; % assign id to the pixels
            lblid = lblid + 1; % increment id
        end
    end
end

figure,
subplot(1, 2, 1), imshow(di, []), title('distance transformed')
subplot(1, 2, 2), imshow(peaks, []), title('after propagating maxima'), colormap(jet)
figure,
subplot(1, 2, 1), imshow(label2rgb(z)), title('segmented')
subplot(1, 2, 2), imshow(im), title('original')
dhanushka
  • 10,492
  • 2
  • 37
  • 47
3

Here is some code (in Python) that will give you a baseline. Count the number of black pixels and divide into the area accounting by how many circles of average size can be packed into a square of your size. The has the virtue of being the simplest possible thing you can do.

If a given method is not on average more accurate than this, then you need a better method. BTW I'm getting around 85% accuracy, so your 95% is not out of the question.

import Image

im = Image.open('ex2a.gif').convert('RGB')
(h,w) = im.size
print h,w
num_pixels = h*w
print num_pixels
black_pixels = 0
for i in range(h):
    for j in range(w):
        q = im.getpixel((i,j)) 
        if q[0]<10 and q[1]<10 and q[2]<10:
            black_pixels = black_pixels + 1
            im.putpixel((i,j),(255,0,0))
r = 15
unpackable = (h/(2*r))*(w/(2*r))*((2*r)**2 - 3.14*r**2)
print 'unpackable:',unpackable
print 'num beans:',round((num_pixels-2*unpackable)/750.0)
im.save('qq.jpg')
carlosdc
  • 12,022
  • 4
  • 45
  • 62
2

Erosion may help. One paper doing that is this one but sadly I did not find a publicly available copy of it.

PhilLab
  • 4,777
  • 1
  • 25
  • 77