1

I'm having this issue because i only want specific fields from the default JSON that returns Google Cloud Document AI. The fields i want to get using the field mask are: "text" and inside "pages" i just only want tables and formFields. For text tables i haven't got any issues doing the field_mask param in the request like this:

    # Configure the process request
    request = documentai.ProcessRequest( 
        name=resource_name, raw_document=raw_document, field_mask="text,pages.tables"
    ) 

The problem is only when im trying to access the fields inside "formFields" because i just want the marked fields i show in this image of the JSON:

enter image description here

I tried using field_mask="pages.formFields.fieldName.textAnchor.content,pages.formFields.fieldValue.textAnchor.content" And when i do the request using Postman, the request take a long time with "Sending request" message and never returns anything.

I want to remember that this problem only happens when im trying to access a field inside "formFields", if i don't try this the request is successful so i think the problem isn't associated with how i do the request, i think is just im not using the correct syntax and i can't find any information or documentation about this. Thank you and advise me if you need more information or code.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21
  • 2
    Here's Google's documentation for [FieldMask](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#google.protobuf.FieldMask). The only reference to repeated fields (like `pages`, `formFields`) is "A repeated field is not allowed except at the last position of a field mask." There appears (!?) to be no syntax to support retrieving child values (often refining the parent to be e.g. `[]` or `[*]`) or specific values e.g. `[0]`. Please try those variants but I think you may have to retrieve the entire repeated element (i.e. `pages`) and filter out-of-band. – DazWilkin Sep 02 '22 at 16:08
  • As pointed out by @DazWilkin, Could you try retrieving a repeated field as last position of a field mask? – Sakshi Gatyan Sep 03 '22 at 07:33
  • I tried using [ ] and doesn't work well, the solution we have implemented is retrieving the entire element (like "pages") and we have done a filter in Python to get the fields that we need. Is a valid solution for us, but i don't know why it can't be easier to do it directly in the field_mask. Thanks guys for the information @DazWilkin, Sakshi Gatyan – Developer Team The Cloud Gate Sep 05 '22 at 07:54

1 Answers1

2

Google Documentation on Field Mask mentions,

A repeated field is not allowed except at the last position of a field mask.

It looks like child values like "content"(in your JSON) cannot be retrieved directly using field mask. However, one can apply filters using client library to retrieve these child values.

Sakshi Gatyan
  • 1,903
  • 7
  • 13
  • The Documentation has been updated to show how to use a `FieldMask` in requests https://cloud.google.com/document-ai/docs/send-request – Holt Skinner Oct 28 '22 at 17:13