Questions tagged [cloud-document-ai]

This tag is for the Document AI product within Google Cloud Platform.

200 questions
1
vote
1 answer

Google Document AI c# mime Unsupported input file format

I am trying to upload a pdf for processing to google's Document AI service. Using google's using Google.Cloud.DocumentAI.V1 for "C#". Looked at the github and docs, not much info. PDF is on the local drive. I converted the pdf to a byte array then…
1
vote
1 answer

What are the quota and limits for Intelligent document quality processor in Google cloud?

Basically, I would like to know the maximum page limit for processing a document via Document quality processor in GCP. Thanks in Advance. When I tried a document with 6 pages, it throws an error that the pages should be 5 but got 6. Need to…
Sailakshmi
  • 31
  • 4
1
vote
2 answers

Permission 'documentai.processors.processOnline' denied on resource (or it may not exist)

I am trying to send a POST request to the Cloud Document AI API using Postman. I have tried sending a POST request with the API key included, along with providing an OAuth access token as the OAuth 2.0 Authorization (generated using gcloud auth…
Connor
  • 21
  • 1
  • 4
1
vote
0 answers

Set default column count

I've got a large number of 3- and 6-column journal and newspaper pages that I want to OCR. I want to automate recognition of columns. I've used tesseract (see a previous question) and Google Cloud Document AI (using the R package daiR) without great…
1
vote
1 answer

how to serialize/deserialize a protobuf response from google documentai API?

I'm working with a google API to process documents from upload. What I'm trying to achieve is saving the protobuf in the response as a .proto file so I could work with it later. I can do response._pb.SerializeToString(), however, I couldn't figure…
1
vote
2 answers

Is there a way to parse the Document AI OCR response into pdf format?

I am passing scanned PDFs into the Google Cloud Document AI OCR. The JSON response (or the Document object returned when using the Python API) contains the content of the PDF in a structured format, as described here. I would like to be able to…
1
vote
1 answer

google.api_core.exceptions.InternalServerError: 500 Failed to process all the documents

I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: https://cloud.google.com/document-ai/docs/ocr#documentai_process_document-python. When I run operation.result(timeout=None) I get this…
1
vote
2 answers

API key with Google Document AI

I am using the form parser of Google Document ai. The only way to authenticate that I have found is through gcloud command interface ("Authorization: Bearer "$(gcloud auth application-default print-access-token)). Our application uses Google Vision…
pascal
  • 11
  • 3
1
vote
2 answers

How do you scale Google Cloud Document AI processing?

From https://cloud.google.com/document-ai/docs/process-forms, I can see some example of processing single files. But in most cases, companies have buckets of documents. In that case, how do you scale the document ai processing? Do you use the…
1
vote
2 answers

TableBoundHints in Google Cloud Document AI not working

I am trying to give a hint in Document AI to get table only in specific area. but it is not working. TableBoundHint tableBoundHints = TableBoundHint.newBuilder() .setBoundingBox(BoundingPoly.newBuilder() // top…
1
vote
1 answer

Get coordinates for Entity from Google AI JSON

I am using Document AI for reading invoices by calling the endpoint and then parsing the response JSON to get entities information. One example of entity JSON data: Do you know how to get the coordinate for each entity? Thanks a lot and any input…
laventy
  • 75
  • 2
  • 6
0
votes
0 answers

Which Document AI processors support Barcode decoding?

I want to decode some ITF and EAN barcodes from images. I saw that Invoice Parser supports, but the price is a bit too much. Do some of the General processors like Document OCR Processor support barcode? I could not find the info in the Processors…
Radu Bogdan
  • 1
  • 1
  • 1
0
votes
0 answers

How to Convert Google Document AI Output to Preserve Layout Text?

I've been using Google Document AI for text extraction from scanned documents, and it's been working well in terms of extracting text. However, I'm facing an issue when it comes to preserving the layout of the text. In AWS, there's a tool called…
Raad Altaie
  • 1,025
  • 1
  • 15
  • 28
0
votes
0 answers

Custom Document Extractor Processor ID Not Found

I am trying to process a pdf document in Python using the Custom Document Extractor from Doc AI. I tried using a managed version which can found in the deploy & use tab on the console. I used Google's sample code and inputted the managed version's…
0
votes
1 answer

Unable to parse 1040 for the year 2020

I am encountering issues while attempting to parse the 1040 form for the year 2020 using a dummy data PDF file. Got this exception Status(StatusCode="InvalidArgument", Detail="Unable to find a document of type: '1040_' (excluding {1040_2020}).") But…