0
import json
ifp = open('log.json')
response = json.load(ifp)

for bodyRow in response['document']['pages'][0]['tables'][1]['bodyRows']:
    for cell in bodyRow['cells']:
        print(f'rowSpan is {cell["rowSpan"]}]')

log.json is the ocr and parsed result of table.

I write this Python script to parse this table with Processors "Form parser" by Google documentai_v1beta3, want to find out the rowSpan of all cells.

There are some cells that span 2 or 3 rows, but I always get the result 1. Is that a bug of documentai_v1beta3?

Table screenshot:

table screenshot

double-beep
  • 5,031
  • 17
  • 33
  • 41
dio lee
  • 11
  • 2
  • 1
    Include a sample of the JSON data in your question. – John Hanley Dec 10 '20 at 03:15
  • added a link to the JSON – dio lee Dec 10 '20 at 04:33
  • TBH, I am not sure what you are trying to achieve exactly. The [Form Parser](https://cloud.google.com/document-ai/docs/form-parser#documentai_batch_process_document-nodejs) is used to send a processing request for a form document (PDF, TIFF, GIF). Is your document PDF, TIFF or GIF? Also, most of field you mention are not present in your Json. – Ksign Dec 14 '20 at 14:51
  • It's PDF. The JSON is result output file of documentai_v1beta3. You can see some row spans in "table screenshot". – dio lee Dec 15 '20 at 00:32
  • Yes, I agree. What I don't understand is how will you catch anything in your `log.json` with the code you shared if the 'document' and 'pages' keys exists but not the rest of them ('tables', 'bodyRows', 'cells', 'rowSpan'). Did I miss something? – Ksign Dec 16 '20 at 10:06
  • 'tables's are under the pages keys, ('bodyRows', 'cells', 'rowSpan') are under 'tables', but deeper. It's a big JSON file. – dio lee Dec 18 '20 at 02:53
  • Hello, I've been struggling with this issue for a while and I would like to shed some light on it, so I filed this [public issue tracker](https://issuetracker.google.com/177302943). There is missing input information in order to reproduce it so I would encourage you to check it and provide there the necessary information. – Kim Jan 12 '21 at 11:34

1 Answers1

0

I recommend referring to this documentation about handling the processing response for the Form Parser which includes Sample Code and information about the Document object response.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Holt Skinner
  • 1,692
  • 1
  • 8
  • 21