Questions tagged [cloud-document-ai]

This tag is for the Document AI product within Google Cloud Platform.

200 questions
1
vote
1 answer

Paragraph numbering in Document AI OCR

could you possibly help me: I have a pdf in Hebrew with numerated paragraphs inside. After processing this pdf with Google Document AI OCR API, I receive text, where paragraph numbering always goes before actual text:this is an example of…
1
vote
1 answer

How to get data is oraganised way from pdf in document ai [cloud-document-ai]

I had created following schema. Schema I am getting all the data in main object or root position, as it can been seen that data belongs to two different person, it should be differentiated individually. Document Is it possible to get data in…
1
vote
1 answer

Cloud document ai

I was learning about the google document and I need it for invoice ocr. I've used the given java code and tried to get extracted text from my respective document. But I want to get the json output and what if this document ai doesn't provide the…
1
vote
1 answer

Document AI batch process operation different payload returned

I'm working on a problem to split documents using document AI. In this problem I'm following the official github repo by document AI for batch processing. The batch process function returns a long running operation. The operation is then polled and…
1
vote
1 answer

Multiple invoices in one file

In my use-case, when performing batch processing of invoices, there's a chance that mutiple invoices might be included in the same file. It appears that each file processed is treated as a single invoice. Is there a way to get an output with the…
1
vote
1 answer

Remove documents queued by Human-in-the-Loop

After setting up Human-in-the-Loop and label filters, I noticed I configured the confidence levels too high as lots of correctly processed documents were marked. I've since lowered the confidence thresholds and now fewer documents get marked, but…
dndr
  • 2,319
  • 5
  • 18
  • 28
1
vote
1 answer

How can I ensure that GCP Document AI model to output JSON with the same name as the input file?

I am using Python to BatchProcess PDFs through GCP Document AI ("DocAI"). The PDFs have long file names such as 71.169892_01-2022.10.15-21275188-1111.pdf. Often the only difference between the filenames are the last four digits before .pdf (such as…
1
vote
2 answers

Can Google Document AI classify documents?

I'm playing with Google Document AI and when I read some documentation from Google and other sources I often see a statement that Document AI can classify documents, not only extract the data by labels. However, I don't see how I can achieve…
1
vote
1 answer

Document AI doesn't recognize parent label area correctly, and does it only on per line basis

I have an issue with Document AI. When I try to create Parent label, with child labels in it, it does not recognize whole area of parent label correctly, and only recongizes it on per line basis, with separate label for each. Can it be done somehow…
Solo
  • 21
  • 2
1
vote
1 answer

How to Process JSON response of Google Document AI OCR Api to proper structure?

I want to make proper structured txt file out of scanned pdf file in Google document ai ocr response, but I get a json response from the document. An ocr response which contains all text of file in one string and X,Y coordinates of pdf file image…
1
vote
1 answer

How can I grant the permission to Document AI specific processor?

It seems that in Document AI the permission can only be granted to project level. How can I grant permission to lower level, like specific processor?
rainbow
  • 11
  • 1
1
vote
0 answers

Google Document AI - Inconsistent Long Running Operation's metadata JSON representation

While checking the status of Document AI - Long Running Operation (Form processor), the JSON representation of decodedOperation.metadata seems to vary during the execution. I suspect that operation response does not resolve straight away despite…
1
vote
1 answer

Document AI OCR processor returning error 3 Unsupported input file format randomly

I am using Google Cloud Document AI for the OCR processor and am randomly running into a code 3 'Unsupported input file format.' error. I can submit the same file 5 times and this error will come up maybe 1-2 out of the 5 times. The other times, the…
Adam B
  • 95
  • 11
1
vote
1 answer

Trying to get specific fields using field_mask in Google Cloud document AI API request Python

I'm having this issue because i only want specific fields from the default JSON that returns Google Cloud Document AI. The fields i want to get using the field mask are: "text" and inside "pages" i just only want tables and formFields. For text…
1
vote
1 answer

Google Document AI does not return textStyle and font information for any document

I am using Document AI services to OCR scanned and machine-generated PDF documents. I have tested with 10 different documents but none of them returned with textStyle properties (it is always empty). Just wanted to make sure if that feature is…
1 2
3
13 14