0

I use the Document AI Invoice Processor for processing scanned invoices. I am using the Java client libraries. I recently noticed that there is an incosistency in the bounding polygons of the extracted entities when the input image is rotated 90deg ccw from the default reading orientation.

When the service is presented with an image of an invoice which is rotated 90deg counterclockwise from the the upright (reading) orientation the returned bounding polygon is not correct. Suppose that the invoice-id field is located somewhere near the top-right corner of the invoice image when it is in the reading orientation. If (for some reason, e.g. a use misplaces the invoice document on the scanner) I send a 90deg ccw rotated image of this invoice to the service it still detects the invoice-id field (in fact, all fields) very well but whereas I would expect the bounding polygon to be somewhere in the top-left corner (because of the rotation) the engine returns a bounding poly that is still in the top-right corner. It very much looks like the returned bounding polygon is relative to the image in its upright (reading) orientation and not relative to the image I actually uploaded to the service (i.e. the 90deg ccw rotated one) as the documentation suggests.

NOTE that I get the bounding polygon by following this reference chain: Document->Entity->PageAnchor->PageRef->BoundingPoly. If I wanted the bounding poly of, say, a paragraph I would follow this reference chain: Document->Page->Pargraph->Layout->BoundingPoly. Note how the last step in this chain is a Document.Page.Layout object which (along with the bounding poly) has an 'orientation' property that specifies the orientation of this layout object relative to the page's orientation. Unfortunately, when reaching for the bounding poly of an extracted entity the reference chain does not include a Layout object. Instead, it goes through a PageRef object which has a bounding poly but NOT an orientation that would allow me to make sense of the returned bounding poly.

So, to get to my question, is this a bug ? Should the returned entity bounding polygon be relative to the uploaded image or is the observed behavior correct and I should transform the polygon somehow ? And, how about the orientation of an extracted entity ? Why is it not conveyed in the PageRef object (like it is done for Blocks, Lines, Tokens etc through the Document.Page.Layout object) ? Is this something that will be added in the future ?

antk
  • 21
  • 1
  • 1

1 Answers1

2

For each page, before being processed, the image can be automatically rotated to the estimated natural-reading orientation. This also works if the scan is skewed (pages will be deskewed automatically).

In the response:

  • document.page[].image contains the page image (deskewed, or original if no pre-processing was needed). The coordinates are relative to this image.
  • document.page[].transforms indicates the applied transforms (empty if no transforms were applied).

Example:

{ // document.page[0]

  // The image for the page. All coordinates are relative to this image.
  "image": { "content": "…", "mime_type": "image/png", "width": 1335,… },

  // Here, 1 transformation matrix was applied to rotate the page.
  "transforms": [ { "rows": 2, "cols": 3,… } ]
}

In a nutshell:

  • The API response is self-sufficient. You shouldn't need to refer to your input document.
  • If you'd like to draw results on a page, use document.page[].image.

I hope this answers your questions.