This tag is for the Document AI product within Google Cloud Platform.
Questions tagged [cloud-document-ai]
200 questions
0
votes
1 answer
Can Google classify document types such as W-2 forms and W-9 forms?
I need to determine the document type first, as we have a bunch of documents. Only after that can I use a specialized processor.
I've tried the Custom Document Classifier in Workbench and trained the model for W-2 and W-9 forms. However, I'm not…

Divyasri D
- 11
0
votes
1 answer
Send a batch process documents request ( Error processing files: Error: 3 INVALID_ARGUMENT: Request contains an invalid argument. )
I am in need of batch processing documents. Here is my code:
const { DocumentProcessorServiceClient } = require("@google-cloud/documentai").v1;
const client = new DocumentProcessorServiceClient()
const name =…

DMOON
- 1
0
votes
1 answer
How can I preserve the text spatial integrity while using the OCR tool from google's cloud vision or document AI?
I am using OCR tool provided by Google's Document AI to extract text from images such as the one given below:
My goal is to create a dataframe with all the metrics given such as Rate, RR, PR as columns that are filled in with the succeeding values.…

Raaghav Rammohan
- 35
- 3
0
votes
1 answer
Document AI - How to use Specialized parser using c# client library?
When I create the W2 parser in the console, it provides a key-value pair of form details. How can I achieve the same in code? Is there any documentation available for C# or any other language?
I've tried the ProcessDocument request using c# client…

Divyasri D
- 11
0
votes
1 answer
How to configure the page number in the gcp document ai toolbox converter?
I am trying to include the page number in the configuration JSON. I tried some ways, but no one works. Looking at the converter code on the GitHub page, I saw a lot of mentions of "page_number", so I think it's possible.
Also, there's some…

Augusto Firmo
- 1
- 1
0
votes
0 answers
Document AI HITL discards labelling
I customized a Custom Document Extractor from purchase_order_parser.
I trianed more than 50 documents. The test score was 51,6% and F1 0,193.
So I enabled HITL and understood Document AI got rid of most of my train labelling and also found I can't…
0
votes
1 answer
Document AI form parsing on documents with different format
We have a client that wishes to automatically extract information in different PDF files to fill their form. Those documents are all different in their format, for example, sometimes to extract the client name, it can be found on top of the first…

prime
- 25
- 4
0
votes
1 answer
How to use the converter from GCP Document AI
I am trying to use the converter from document ai to converter some JSONs to Document AI JSON format. Using the function described in this…

Augusto Firmo
- 1
- 1
0
votes
1 answer
How can I authenticate to DocumentAI in GCP?
I created a service account with the roles: Document AI Administrator and Service Account Key Admin.
However, when I try to fetch an access token using googleauth (1.7.0) Ruby gem, i get the following error:
Signet::AuthorizationError (Authorization…

Kamilski81
- 14,409
- 33
- 108
- 161
0
votes
1 answer
Document AI to perform automatic research in large amount of data from pdf files
I need to add a feature for my app to allow my clients to extract text from image texts and parse them to usable data like json format and store them to then be able to perform better data research.
Those image-texts are big pdf files (~150-500…

prime
- 25
- 4
0
votes
1 answer
How to Train and Test Custom Classifier Processor Of Document AI using Python
I want to train and test custom document Classifier using Python Code and I found this train Processor. And I started implementing using this Documentation. But I am getting one error when I call function
train_processor_version_sample(497857003374,…

Nitin Saini
- 507
- 2
- 10
- 26
0
votes
2 answers
How to process a single GCS-stored file on Document AI with the Python client?
I have been testing out the Google Document AI Python client,
but I couldn't get the process_document()
function working when trying to process one single document stored on Google Cloud Storage.
What I currently have:
A working quick start…

mimocha
- 1,041
- 8
- 18
0
votes
1 answer
Set processOptions for doc ai ocr api request
We are recommended to activate a flag on our OCR processor for better results because we face some problems (like "I" parsed as "1". To test the flag I want to use Postman, but adding the options to my requests results in an error.
My Request:
{
…

N4go
- 13
- 4
0
votes
1 answer
Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error
I'm running the sample query for Python using an OCR Google Document AI processor. The only difference between my query and this sample query:
process_document_sample(
project_id="99999FAKE",
location="us",
processor_id="99999FAKE",
…

Hack-R
- 22,422
- 14
- 75
- 131
0
votes
1 answer
Can we pass table column info to help FormParser determine header_row contents?
Suppose I have a pdf file containing the following table info
Trainer: Giannis
Pokedex: Incomplete
Name
Type
Weight
Height
Color
Pikachu
Electric
6.0 kg
0.4 m
Yellow
Bulbasaur
Grass/Poison
6.9 kg
0.7 m
Green
Charizard
Fire/Flying
90.5…

inpap
- 365
- 3
- 12