If you are using MATLAB (which I hope you are, since it it awesome for tasks like these), I suggest you look into the built in function bwlabel() and regionprops(). These should be enough to segment out all the characters and get their bounding box information.
Some sample code is given below:
%Read image
Im = imread('im1.jpg');
%Make binary
Im(Im < 128) = 1;
Im(Im >= 128) = 0;
%Segment out all connected regions
ImL = bwlabel(Im);
%Get labels for all distinct regions
labels = unique(ImL);
%Remove label 0, corresponding to background
labels(labels==0) = [];
%Get bounding box for each segmentation
Character = struct('BoundingBox',zeros(1,4));
nrValidDetections = 0;
for i=1:length(labels)
D = regionprops(ImL==labels(i));
if D.Area > 10
nrValidDetections = nrValidDetections + 1;
Character(nrValidDetections).BoundingBox = D.BoundingBox;
end
end
%Visualize results
figure(1);
imagesc(ImL);
xlim([0 200]);
for i=1:nrValidDetections
rectangle('Position',[Character(i).BoundingBox(1) ...
Character(i).BoundingBox(2) ...
Character(i).BoundingBox(3) ...
Character(i).BoundingBox(4)]);
end
The image I read in here are from 0-255, so I have to threshold it to make it binary. As dots above i and j can be a problem, I also threshold on the number of pixels which make up the distinct region.
The result can be seen here:
https://www.sugarsync.com/pf/D775999_6750989_128710