Getting this error 'Too many connected components for a page image : ' when using Kraken library in python on an image

Question

I am trying to read a newspaper using OCR using tessaract. Before passing the image to tessaract, I am using Kraken to segment the actual lines and draw a line across the sentences for proper detection by tessaract. When passing the image through kraken.pageseg.segment , I am getting an empty list and this output Too many connected components for a page image : 5903 , instead it should have returned a list containg the coordinates of the bounding box around the sentences.

I looked up the source code of kraken and found this perticular error message, but I am unable to understand it. [Source code for error][1]

[1]: https://github.com/mittagessen/kraken/blob/master/kraken/pageseg.py#:~:text=connected%20components%20for%20a-,page,-image%3A%20%7Bccs%7D%27)

score 2 · Answer 1 · answered May 21 '22 at 13:56

I had the same problem and solved it after looking at the Kraken API quickstart guide.

Try changing your image binarization. If you were doing binarization with PIL (Pillow), use the kraken binarization method like this:

from PIL import Image
from kraken import binarization, pageseg

im = Image.open('foo.png')
bw_im = binarization.nlbin(im)
seg_data = pageseg.segment(bw_im)

Reference: https://kraken.re/master/api.html

score 0 · Answer 2 · answered Apr 12 '22 at 11:44

0

Try downgrading the package to version "2.0.1"

    pip install kraken==2.0.1

I had the same problem with higher versions and downgrading just solved it.

answered Apr 12 '22 at 11:44

Just Guest

1

Getting this error 'Too many connected components for a page image : ' when using Kraken library in python on an image

2 Answers2