0

I am working with the form parser in google document ai.

when I send the request :

curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) 
-H "Content-Type: application/json; charset=utf-8" 
-d @request.json https://eu-documentai.googleapis.com/v1beta3/projects/<project id>
/locations/eu/processors/<processor id>:process

I obtain a document with no entity with this structure :

{                                                                               
  "document": {                                                                 
    "uri": "",                                                                  
    "mimeType": "application/pdf",                                              
    "text": "Pascal Carrié\nincwo SAS\nx2289475d\nc/ santa isabel, 12, 4D\nN° TVA FR33494952401\n28012 Madrid\n16 rue de La Comète\nIban :\nES76 3023 0047 4866 6328 6612\n75007 Paris\nrib:\nBCOEESMM023\nFRANCE\nFactura nº 2020/02\nFecha\n11/2/20\nConcepto\nPrecio\nCuandidad\nIVA\nImporte\ndéveloppement backend\n4250\n1\n0%\n4,250.00 €\nCondicións de pago : A la recepción de la factura\nBase Imponible\n4,250.00 €\nTotal IVA\n0.00 €\nTOTAL\n4,250.00 €\nForma de pago\nContado\n", 
    "pages": [                                                                  
      {                                                                         
        "pageNumber": 1,                                                        
        "dimension": {                                                          
          "width": 2378,                                                        
          "height": 1681,                                                       
          "unit": "pixels"                                                      
        },                                                                      
        "layout": {                                                             
          "textAnchor": {                                                       
            "textSegments": [                                                   
              {                                                                 
                "endIndex": "431"                                               
              }                                                                 
            ]                                                                   
          },                                                                    
          "boundingPoly": {                                                     
            "vertices": [                                                       
              {},                                                               
              {               ...

No analysis at all. What am I doing wrong?

When I upload the same document in the demonstration, it works fine.

I don't think it's base64 related; I have coded my document and obtain a string as describe in the doc

Rohit Gupta
  • 4,022
  • 20
  • 31
  • 41
pascal
  • 11
  • 3

2 Answers2

0

You need to loop through the pages->formFields object. In each object you'll find fieldName and fieldValue, which includes textAnchor->textSegments startIndex and endIndex; With this information you can substr the "text" property.

Basic sample in php:

$json = json_decode(file_get_contents('d:\tmp\response.json'));
$text = utf8_decode($json->text);
foreach($json->pages as $indx => $pag){
    foreach($pag->formFields as $indx2 => $field){
        $from = $field->fieldName->textAnchor->textSegments[0]->startIndex;
        $to = $field->fieldName->textAnchor->textSegments[0]->endIndex - $from-1;
        $name = substr($text, $from, $to);
        
        $from = $field->fieldValue->textAnchor->textSegments[0]->startIndex;
        $to = $field->fieldValue->textAnchor->textSegments[0]->endIndex - $from;
        $value = substr($text, $from, $to);
        $fields[] = [$name => $value];
    }
}
print_r($fields);
  • thanks, the results I obtains are poors. I made the implementation in ruby. Perhaps my implantation is not good. But anyway I was expecting the entities return from Google (as they claim in the documentation) – pascal Jan 04 '21 at 10:40
0

Here is the updated documentation for how to Handle the Processing Response from the Document AI Form Parser.

You can also check out these Codelabs showing how to use the Form Parser in Python and Node.js.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21