0

What I understood from DocumentAI docs is that the best match to extract information from a report like medical test result is to use the Form Parsing processor. This does a good job for reports where there is exactly one value for one label. Like patient name or patient age etc. But I was trying to get the table of various test results in a map of Key Value pair where key is the test name and value us the result.

With custom processor I tried to choose a label with property which can appear multiple times but that does not maintain the link between testName and testValue.

The Report looks like the follows enter image description here

Desired Result would probably be

{
  name : Jon Doe
  age : 76
    tests :[ 
    {
     testName : CRP , 
     testValue : 51
    },
    {
     testName : Creatinine , 
     testValue : 0.8
    }
]
}

I think it would be something similar to table. https://cloud.google.com/document-ai/docs/handle-response

Neil
  • 5,919
  • 15
  • 58
  • 85

1 Answers1

0

The Form Parser Processor allows for Table Parsing when it can detect tables in the document. This sample code shows how the formFields and tables can be extracted.

https://cloud.google.com/document-ai/docs/handle-response#forms_and_tables

This Form Parser Codelab also shows a few more examples, like transforming the formFields & Tables into a Pandas DataFrame.

https://codelabs.developers.google.com/codelabs/docai-form-parser-v1-python

You can also create a Custom Document Extractor processor that makes a custom model for the specific document structure, but you will have to label example documents and train a new version.

Note, this creates an Entity Extraction processor, which works differently than the Form Parser (and doesn't currently extract form fields & tables in the same way).

You'll need to label each entity individually, train the processor, and use this sample code to get the entity information from the processing response.

https://cloud.google.com/document-ai/docs/handle-response#entities_nested_entities_and_normalized_values

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21
  • Do you mean the example in question can use custom document extractor but can not parse the table? (in my case there is no border of the table) Any example how to train tabular structure? – Neil Feb 03 '23 at 13:02
  • You can parse data in a tabular structure, but you have to label the entities in your training data. And the processing output will not be in the Table format, but as entities. https://cloud.google.com/document-ai/docs/handle-response#entities_nested_entities_and_normalized_values This is the main documentation on labeling documents. https://cloud.google.com/document-ai/docs/workbench/label-documents – Holt Skinner Feb 08 '23 at 18:09