0

I have scans from a clip art book; each page was scanned to tiff; each tiff has approximately 18-20 clip art images—how would I automate the selection and extraction of each of these 18-20 images, retain color depth/ppi, and save each clip art image as its own image file.

Linking to a version of what I’m describing—ideally would take example image and dump each clip art image in file to individual files. Ideally would process image/directory with minimal user interaction. Happy to use command line, gui, whatever…macOS,Linux, Windows all fine.

Thanks for any ideas on how to approach—

Probably underthinking this—wondered if Photoshop actions or a Google Cloud Vision process might work…thought of tensorflow…or some method to ID image boundaries within a page/file, use coordinates to kick out each clip art image, but just stalled at the start. Surely this is something in the CV arsenal, I think I’m just lacking the knowledge of libraries/modules/existing tools and vocabulary to get started. Couldn’t find anything imagemagick-related. Don’t want hand select/copy/paste every page.

Cody
  • 1
  • This is possible via Photoshop scripting I did something similar years ago. I did it in two parts, selecting the items isolating them from the background - which can be hit and miss depending on the integrity of your image. Secondly, it's a case of looping through each of the selections and saving out each one in turn. – Ghoul Fool Dec 07 '22 at 20:41

1 Answers1

1

In Imagemagick, you can do that using connected components processing. But it is very dependent upon getting a good threshold to separate your objects from the background. Note that jpg is not a good format. Background color is not uniform and has compression artifacts especially near the objects

enter image description here

What I do is:

  • Convert to gray and threshold and negate so the objects are white on a black background.

  • Then I do the connected components processing on the binary image merging objects that are smaller than 5000 pixels in area into their surroundings. This helps mitigate holes and throws out smaller objects that arise from noise and compression. I then save the bounding box and the centroid for all object found.

  • I then do a for loop over each object found. I retrieve the bounding box and crop the input original image and save it.

  • I also use flood fill and +opaque to make everything not the main object in the binary image black and the object white.

  • I crop the processed binary image at the same bounding box and put it into the alpha channel.

  • Then I flatten the image over white so that the background becomes white and save the masked result.


cd
cd desktop/clipart_separate
OLDIFS=$IFS
IFS=$'\n'
dataArr=(`convert clipart.jpeg \
-colorspace gray \
-threshold 73% \
-negate \
-type bilevel \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-define connected-components:area-threshold=5000 \
-connected-components 8 tmp.png | \
grep "gray(255)" | awk '{print $2, $3}'`)
num=${#dataArr[*]}
for ((i=0; i<num; i++)); do
bbox=`echo ${dataArr[$i]} | cut -d\  -f1`
centroid=`echo ${dataArr[$i]} | cut -d\  -f2`
convert clipart.jpeg -crop $bbox +repage clipart_$i.jpg
convert -quiet clipart_$i.jpg \
\( tmp.png -fill red -draw "color $centroid floodfill" -alpha off \
-crop $bbox +repage \
-fill black +opaque red -fill white +opaque black \) \
-alpha off -compose copy_opacity -composite \
-compose over -background white -flatten \
clipart_masked_$i.jpg
done
IFS=$OLDIFS

resulting files

fmw42
  • 46,825
  • 10
  • 62
  • 80
  • Thanks for the info and observations about the quality of the image. If I were to make this a grayscale image with improved levels or a bitmap b/w—in any case if I removed the artifacts…sloppiness would your imagemagick commands still work? In any case this is a helpful start and I assumed imagemagick should make this possible but with all the AI/ML object id developments I kinda assumed it wasn’t as relevant anymore. Thank you! – Cody Dec 08 '22 at 23:41
  • @Cody You need a very uniform background color however you can achieve that. Some issues could be shadowing or color of your background paper or jpg artifacts. Best if you can scan as PNG or TIFF (not jpg compressed), though file size would be much larger. Unfortunately, I do not know how you "pasted" the cutout images onto the sheet. But the script works if you fine tune the threshold, except for one (or two) of the results. Or you can just take the rectangular crops that include portions of other objects. In the future, if you space them further apart, then script would not have this issue – fmw42 Dec 09 '22 at 01:00