I have a source pdf(untagged.pdf) out of which I would be creating a tagged version(tagged.pdf)
I have information of all the html tags of all contents of the source pdf.
Now I have a figure on page 3. When I programmatically parse, this will not be detected as an image but this is a rectangle with some text and another rectangle like below.
_____________________ ____________________
| Some text inside | ----> | Some other text |
| | ----> | Inside |
|_____________________| ----> |____________________|
Fig 1.x Rectangle 1 to Rectangle 2
Using some other techniques, I have detected this is a figure and bounding coordinates of the same. Lets say the bounding coordinates is [10, 30] and [100, 60], I want to tag the whole thing as a figure(like below)
_____________________________________________________________(100, 60)
| |
| _____________________ ____________________ |
| | Some text inside | ----> | Some other text | |
| | | ----> | Inside | |
| |_____________________| ----> |____________________| |
| |
| Fig 1.x Rectangle 1 to Rectangle 2 |
|_____________________________________________________________|
(10, 30)
Now I want to tag this the entire section as an image. I have checked libraries like itextpdf or pdfbox. They dont have APIs to tag a figure using coordinates.
In other words, are there any ways to tag an element(group of images) as a figure programmatically.