0

The paper 'Metric-based No-reference Quality Assessment of Heterogeneous Document Images', discusses about measuring the quality of characters in a document image. I'm having difficulty to understand the white speckle metric in page 7.

Small white speckle measures how much fattened character strokes have shrunken existing white connected components inside characters, or have created new ones by connecting strokes of characters. A histogram of white connected components in a document image is computed, and we have already found the most frequent font size.Then the white speckle is computed by summing up the histogram bins between 1 pixel and 1% of font size squared. The sum is then normalized by dividing by the area under the histogram between 1 and font size squared.

My questions are:

  1. How a histogram of white connected components in a document image is computed?
  2. How a white speckle is computed by summing up the histogram bins between 1 pixel and 1% of font size squared? Lets say for example, the most frequent font size is 32, so I have to sum up the frequencies from histogram bin 1 to one percent of 32^2 (1024)? Is that right?
  3. Honestly, I dont see any relation of computing or summing up the histogram bins between 1 pixel and 1% of font size squared to the small white speckle measure. Can you help me see the relation?

Thanks.

alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80

1 Answers1

1

I did't read the entire paper but it think i understand what they are doing.

The document is composed from black charters on a white background a binary image(I don't know how exactly they threshold the image but i think this is the input). Inside those charters there are some small white areas.

  1. Calculate the connected component in the image. Lets say you have N of them in the document. Each one of the connected component has a size, the number of pixels the component have. Using the size of the component we can create an histogram which counts the number of components with size 1, 2, ....
  2. We are looking for speckles inside charters. The paper defines a small speckle to be a connected component with size of 1 pixel til 1% of the most frequent charter area. The charter area is the font size squared. So you are correct when you state that you will be summing all the bins between 1 and one percent of 32^2

  3. They define a small speckle to be a connected component with some small size and specially a size which is equal to 1% of the font size squared. In order to measure how much of speckles you have in a document you just sum all of those connected component that fall into this definition. At the end you normalized it so you have a measurement that can be compared between different documents.

You can disagree with there assumption and think maybe a small speckle should be define entirely different but this was there definition.

Hope i helped a little bit

alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
Amitay Nachmani
  • 3,259
  • 1
  • 18
  • 21