-1

Is it possible to automatically extract tiles from comics, with an existing tool like ImageMagick or should I code a tool myself?

I have seen answers using ImageMagick (Using imagemagick how can i slice up an image into several separate images?, https://superuser.com/questions/1308928/how-to-automatically-crop-and-cut-an-image-into-several-images-using-imagemagick/1308953#1308953) but in my case the tiles can be of different size (the height can change).

There is always 1 tile below each other (only 1 column) and each tile is separated by some space of the same color (a horizontal color gradient with black, grey or white could be used in the image for spacing tiles) so it should be possible to detect when there are new tiles and extract them, by looking at horizontal lines with the same pixels color.

Ideally, it should also be possible to extract tiles if the comics has 2 or more columns with tiles from different height (which could be a bit more complicated since there will not necessarily be full horizontal lines with the same pixels color).

Update: You can find a quick sample I made below, as requested. Some comics have some characters and text bubbles going out of the tile which make it not possible to compare pixels on a horizontal line, so I added this on purpose on the sample. I also added another column and tiles with a different width or height in order to have a sample with a summary of what can be found in comics.

comics sample

baptx
  • 3,428
  • 6
  • 33
  • 42
  • It would be great if you would post an example image. – kavko Sep 07 '20 at 11:00
  • @kavko you can find several samples on https://www.webtoons.com/ – baptx Sep 07 '20 at 12:15
  • Throwing a web link with DIY instructions is rude. Not counting that in fact the link does not help. –  Sep 07 '20 at 12:41
  • 1
    @YvesDaoust sorry if it seemed rude, it was not meant to be. I added a link since I don't own the content so I think I am not allowed to upload a sample directly. I don't think this deserves a downvote. – baptx Sep 07 '20 at 12:44
  • From my point of view, kavko's request is still not addressed. –  Sep 07 '20 at 12:45
  • I would have to agree with Yves Daoust. Also the tiles of the comics on the page you posted are not always separated as described in the question and also they cannot be downloaded. – kavko Sep 07 '20 at 13:07
  • @kavko to get a sample by clicking on the previous link, for example with Firefox web browser, you need to select a comic and then an episode. After that, you need to inspect an element with the shortcut Ctrl+Shift+C and click on the first image. Then in the web console, you need to click on the "div" element having the id "_imageList" and right click on it to select "Screenshot Node". The sample mentioned in my question is not available publicly but any other webtoon sample should be fine to solve my problem. – baptx Sep 07 '20 at 13:14
  • I cannot seem to download an example from that web site. It would be much better if you just uploaded one of your images to some free hosting service and put the link here so that it can be downloaded simply. If those examples are simply one tall image and each cartoon is the same size, then ImageMagick can do that easily with -crop WxH +repage, where WxH is the size of any one cartoon. If you want to separate the cartoons and do not know the size, but there is white space between, then you can do that using -connected-components to get the bounding boxes of varying dimensions – fmw42 Sep 07 '20 at 18:51
  • @fmw42 I created a quick sample and updated my answer by adding the image (I had to create the sample myself since I don't own the original sample so I am not allowed to share it publicly due to copyright). I followed these instructions using -connected-components but I did not find a way to slice the tiles: https://imagemagick.org/script/connected-components.php – baptx Sep 08 '20 at 08:54

1 Answers1

2

Here is how to do that in ImageMagick. But I note that your drawings are likely not representative. First, I was expecting a vertical stack of frames, not a random arrangement. Second, parts of your figures overlap in X or Y. so that the bounding boxes will overlap. I use connected components to extract the bounding boxes. Then I simply loop over the bounding boxes and crop the image.

Input:

enter image description here

Unix syntax:

bboxArr=(`convert -quiet boxes.png +repage -threshold 50% \
-morphology open square:3 -type bilevel \
-define connected-components:exclude-header=true \
-define connected-components:verbose=true \
-define connected-components:area-threshold=1500 \
-define connected-components:mean-color=true \
-connected-components 4 null: | grep "gray(0)" | awk '{print $2}'`)
num=${#bboxArr[*]}
for ((i=0; i<num; i++)); do
bbox="${bboxArr[$i]}"
echo "$bbox;"
convert -quiet boxes.png +repage -crop "$bbox" +repage boxes_$i.png
done

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

Here is a better example:

enter image description here

bboxArr=(`convert -quiet DoomPatrol1.jpg +repage -negate -threshold 25% -type bilevel \
-define connected-components:exclude-header=true \
-define connected-components:verbose=true \
-define connected-components:area-threshold=20000 \
-define connected-components:mean-color=true \
-connected-components 8 null: | grep "gray(255)" | awk '{print $2}'`)
num=${#bboxArr}
for ((i=0; i<num; i++)); do
bbox="${bboxArr[$i]}"
echo "$bbox;"
convert -quiet DoomPatrol1.jpg +repage -crop "$bbox" +repage boxes_$i.png
done

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

fmw42
  • 46,825
  • 10
  • 62
  • 80
  • Thanks. Is it possible to sort the boxes in the order they appear when we save the filename with the box number (left to right and then top to bottom)? Do you know why the 2 scripts you made are not working on other comics like https://www.webtoons.com/en/comedy/toaster-dude/ep-1/viewer?title_no=1983&episode_no=1? To download a sample, you can make a screenshot with this Selenium WebDriver script using Node.js and GeckoDriver: https://pastebin.com/DX9w8PSu – baptx Sep 09 '20 at 11:54
  • I got the error `convert: ../../magick/resource.c:1098: RelinquishMagickResource: Assertion \`resource_info.memory >= 0' failed.` when trying with the webtoons.com sample mentioned above but even if I cut the image with GIMP to have a smaller height, the boxes are not extracted correctly. – baptx Sep 09 '20 at 11:55
  • What is your ImageMagick version and platform? The message appears to me to indicate you run out of memory. Check your resources (`convert -list resource`). If necessary, modify your ImageMagick policy.xml file, if you are not on a shared server. – fmw42 Sep 09 '20 at 18:24
  • There was a typo in one of my command. `num=${#bboxArr}` should have been `num=${#bboxArr[*]}`. I have edited my code in my post above. See if that works, now. – fmw42 Sep 09 '20 at 18:26
  • I am using ImageMagick 6.9.10-23 Q16 x86_64 20190101 from the latest Ubuntu 20.04 package. Changing the memory from 256MiB to 512MiB in `/etc/ImageMagick-6/policy.xml` fixed the error. With your new command, all tiles are still not extracted from the Toaster Dude link I shared, for example the first tiles and the tiles in the classroom. Sometimes too much images are extracted like text bubbles "Well..." and when the girl says "Hello, dude.". – baptx Sep 10 '20 at 08:43
  • Your toaster dude image does not have a consistent background that separates the tiles. So my script above would not work. If you know the size of the tiles, then just use `-crop WxH +repage` to get all the tiles. See https://imagemagick.org/Usage/crop/#crop_tile or use the number of tiles you expect as `-crop 1xN@ +repage`. See https://imagemagick.org/Usage/crop/#crop_equal – fmw42 Sep 10 '20 at 17:32
  • I don't know the size of the tiles since the height can change. I don't know the number of tiles either (I would have to count them manually for each comics so it would not really be automated). But I tried your option `-crop 1xN@ +repage` by replacing N with the number of tiles but the tiles still have missing parts. So it is not possible to do it with ImageMagick or another existing tool on this example? – baptx Sep 11 '20 at 14:00
  • Not unless each frame has a common border that separates frames uniquely. – fmw42 Sep 11 '20 at 16:24