pdf manipulation - tagging image or figure

Question

I have a source pdf(untagged.pdf) out of which I would be creating a tagged version(tagged.pdf)

I have information of all the html tags of all contents of the source pdf.

Now I have a figure on page 3. When I programmatically parse, this will not be detected as an image but this is a rectangle with some text and another rectangle like below.

    _____________________         ____________________
   |    Some text inside | ----> |   Some other text  |
   |                     | ----> |            Inside  |
   |_____________________| ----> |____________________|

             Fig 1.x Rectangle 1 to Rectangle 2

Using some other techniques, I have detected this is a figure and bounding coordinates of the same. Lets say the bounding coordinates is [10, 30] and [100, 60], I want to tag the whole thing as a figure(like below)

   _____________________________________________________________(100, 60)
  |                                                             |
  |      _____________________         ____________________     |
  |     |    Some text inside | ----> |   Some other text  |    |
  |     |                     | ----> |            Inside  |    |
  |     |_____________________| ----> |____________________|    |
  |                                                             |
  |           Fig 1.x Rectangle 1 to Rectangle 2                |
  |_____________________________________________________________|
  (10, 30)

Now I want to tag this the entire section as an image. I have checked libraries like itextpdf or pdfbox. They dont have APIs to tag a figure using coordinates.

In other words, are there any ways to tag an element(group of images) as a figure programmatically.

have you checked if you can *identify* the image section by using something like [pdf2data](https://pdf2data.online/) from iText? You can try it online without any code. Otherwise, I'd suggest you post the PDF file you're working on so that someone can take a look at it. — André Lemos, Feb 08 '19 at 08:05
I have identified the image bounding box in pdf. I have to tag them as image. — SuperNova, Feb 08 '19 at 08:19
is it possible you provide an example PDF so I can see what you are trying to achieve/tag? If you are confortable with the PDF structure, you can also check [RUPS](https://itextpdf.com/en/products/rups-reading-and-updating-pdf-syntax) to see how your PDF is being structured, and then use a similar approach as the one described on [this post](https://itextpdf.com/en/resources/examples/itext-7/tagged-pdf-adding-alt-structure-tree). — André Lemos, Feb 08 '19 at 08:29
thanks for the reply, It is not about that specific pdf or image. I am trying to build a generic solution, wherein I want to tag an element using its coordinates. — SuperNova, Feb 08 '19 at 08:41

pdf manipulation - tagging image or figure

0 Answers0