1

I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents. I annotated the dataset and I completed training my model. Now i am able to access the processor using the Python SDK to send input requests and am able to fetch responses.

While parsing the response, under the section: result.document.entities[0].page_anchor.page_refs[0].bounding_poly.normalized_vertices where i get normalized co-ordinate values, that is on a scale from 0-1, which represents the location of the Entity/Value on a given page on PDF.

A sample example of the values are as below:

[x: 0.30874478816986084
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.34131988883018494
x: 0.47531232237815857
y: 0.36359813809394836
x: 0.30874478816986084
y: 0.36359813809394836]

Under the Page dimensions object: result.document.pages[0] object i get the pixel scale values of the page. Example object response looks like:

dimension {
  width: 1681.0
  height: 2379.0
  unit: "pixels"
}

My Expecations:

Now my expectation is to fetch the positions of the entities, by scaling up the normalized co-ordinates. and crop that part of the PDF page, which is converted as Image using pdf2image module.

I am using cv2 module for image processing here.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21
Rajiv2806
  • 88
  • 4

1 Answers1

1

The Document AI Toolbox SDK for Python has functionality to export images from an Entity bounding box. Currently, it's set to only export detected images (such as a profile photo from a drivers license) but the same code should work to export an image of an entity with text.

https://github.com/googleapis/python-documentai-toolbox/blob/c1843812d988b4a9877b66176be8d103b55b112a/google/cloud/documentai_toolbox/wrappers/entity.py#LL66C5-L90C64

Something like this should work for you

from io import BytesIO
from PIL import Image

page_ref = documentai_entity.page_anchor.page_refs[0]
doc_page = documentai_document.pages[page_ref.page]
image_content = doc_page.image.content

doc_image = Image.open(BytesIO(image_content))
w, h = doc_image.size
vertices = [
  (int(v.x * w + 0.5), int(v.y * h + 0.5)) for v in page_ref.bounding_poly.normalized_vertices
]
(top, left), (bottom, right) = vertices[0], vertices[2]
entity_image = doc_image.crop((top, left, bottom, right))
Holt Skinner
  • 1,692
  • 1
  • 8
  • 21