Questions tagged [cloud-document-ai]

This tag is for the Document AI product within Google Cloud Platform.

200 questions
0
votes
1 answer

Can Google classify document types such as W-2 forms and W-9 forms?

I need to determine the document type first, as we have a bunch of documents. Only after that can I use a specialized processor. I've tried the Custom Document Classifier in Workbench and trained the model for W-2 and W-9 forms. However, I'm not…
0
votes
1 answer

Send a batch process documents request ( Error processing files: Error: 3 INVALID_ARGUMENT: Request contains an invalid argument. )

I am in need of batch processing documents. Here is my code: const { DocumentProcessorServiceClient } = require("@google-cloud/documentai").v1; const client = new DocumentProcessorServiceClient() const name =…
0
votes
1 answer

How can I preserve the text spatial integrity while using the OCR tool from google's cloud vision or document AI?

I am using OCR tool provided by Google's Document AI to extract text from images such as the one given below: My goal is to create a dataframe with all the metrics given such as Rate, RR, PR as columns that are filled in with the succeeding values.…
0
votes
1 answer

Document AI - How to use Specialized parser using c# client library?

When I create the W2 parser in the console, it provides a key-value pair of form details. How can I achieve the same in code? Is there any documentation available for C# or any other language? I've tried the ProcessDocument request using c# client…
0
votes
1 answer

How to configure the page number in the gcp document ai toolbox converter?

I am trying to include the page number in the configuration JSON. I tried some ways, but no one works. Looking at the converter code on the GitHub page, I saw a lot of mentions of "page_number", so I think it's possible. Also, there's some…
0
votes
0 answers

Document AI HITL discards labelling

I customized a Custom Document Extractor from purchase_order_parser. I trianed more than 50 documents. The test score was 51,6% and F1 0,193. So I enabled HITL and understood Document AI got rid of most of my train labelling and also found I can't…
0
votes
1 answer

Document AI form parsing on documents with different format

We have a client that wishes to automatically extract information in different PDF files to fill their form. Those documents are all different in their format, for example, sometimes to extract the client name, it can be found on top of the first…
prime
  • 25
  • 4
0
votes
1 answer

How to use the converter from GCP Document AI

I am trying to use the converter from document ai to converter some JSONs to Document AI JSON format. Using the function described in this…
0
votes
1 answer

How can I authenticate to DocumentAI in GCP?

I created a service account with the roles: Document AI Administrator and Service Account Key Admin. However, when I try to fetch an access token using googleauth (1.7.0) Ruby gem, i get the following error: Signet::AuthorizationError (Authorization…
Kamilski81
  • 14,409
  • 33
  • 108
  • 161
0
votes
1 answer

Document AI to perform automatic research in large amount of data from pdf files

I need to add a feature for my app to allow my clients to extract text from image texts and parse them to usable data like json format and store them to then be able to perform better data research. Those image-texts are big pdf files (~150-500…
prime
  • 25
  • 4
0
votes
1 answer

How to Train and Test Custom Classifier Processor Of Document AI using Python

I want to train and test custom document Classifier using Python Code and I found this train Processor. And I started implementing using this Documentation. But I am getting one error when I call function train_processor_version_sample(497857003374,…
0
votes
2 answers

How to process a single GCS-stored file on Document AI with the Python client?

I have been testing out the Google Document AI Python client, but I couldn't get the process_document() function working when trying to process one single document stored on Google Cloud Storage. What I currently have: A working quick start…
mimocha
  • 1,041
  • 8
  • 18
0
votes
1 answer

Set processOptions for doc ai ocr api request

We are recommended to activate a flag on our OCR processor for better results because we face some problems (like "I" parsed as "1". To test the flag I want to use Postman, but adding the options to my requests results in an error. My Request: { …
N4go
  • 13
  • 4
0
votes
1 answer

Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error

I'm running the sample query for Python using an OCR Google Document AI processor. The only difference between my query and this sample query: process_document_sample( project_id="99999FAKE", location="us", processor_id="99999FAKE", …
Hack-R
  • 22,422
  • 14
  • 75
  • 131
0
votes
1 answer

Can we pass table column info to help FormParser determine header_row contents?

Suppose I have a pdf file containing the following table info Trainer: Giannis Pokedex: Incomplete Name Type Weight Height Color Pikachu Electric 6.0 kg 0.4 m Yellow Bulbasaur Grass/Poison 6.9 kg 0.7 m Green Charizard Fire/Flying 90.5…
inpap
  • 365
  • 3
  • 12