This tag is for the Document AI product within Google Cloud Platform.
Questions tagged [cloud-document-ai]
200 questions
1
vote
1 answer
Paragraph numbering in Document AI OCR
could you possibly help me:
I have a pdf in Hebrew with numerated paragraphs inside. After processing this pdf with Google Document AI OCR API, I receive text, where paragraph numbering always goes before actual text:this is an example of…

Julia Grobman
- 11
- 1
1
vote
1 answer
How to get data is oraganised way from pdf in document ai [cloud-document-ai]
I had created following schema.
Schema
I am getting all the data in main object or root position, as it can been seen that data belongs to two different person, it should be differentiated individually.
Document
Is it possible to get data in…

Abhimanyu Goswami
- 38
- 3
1
vote
1 answer
Cloud document ai
I was learning about the google document and I need it for invoice ocr. I've used the given java code and tried to get extracted text from my respective document. But I want to get the json output and what if this document ai doesn't provide the…

Dushyant Aneja
- 9
- 2
1
vote
1 answer
Document AI batch process operation different payload returned
I'm working on a problem to split documents using document AI. In this problem I'm following the official github repo by document AI for batch processing.
The batch process function returns a long running operation.
The operation is then polled and…

Poojesshwaran V
- 13
- 2
1
vote
1 answer
Multiple invoices in one file
In my use-case, when performing batch processing of invoices, there's a chance that mutiple invoices might be included in the same file.
It appears that each file processed is treated as a single invoice. Is there a way to get an output with the…

user2132770
- 13
- 2
1
vote
1 answer
Remove documents queued by Human-in-the-Loop
After setting up Human-in-the-Loop and label filters, I noticed I configured the confidence levels too high as lots of correctly processed documents were marked.
I've since lowered the confidence thresholds and now fewer documents get marked, but…

dndr
- 2,319
- 5
- 18
- 28
1
vote
1 answer
How can I ensure that GCP Document AI model to output JSON with the same name as the input file?
I am using Python to BatchProcess PDFs through GCP Document AI ("DocAI"). The PDFs have long file names such as 71.169892_01-2022.10.15-21275188-1111.pdf. Often the only difference between the filenames are the last four digits before .pdf (such as…

imihailov
- 13
- 3
1
vote
2 answers
Can Google Document AI classify documents?
I'm playing with Google Document AI and when I read some documentation from Google and other sources I often see a statement that Document AI can classify documents, not only extract the data by labels. However, I don't see how I can achieve…

Vladimir Mischenko
- 300
- 2
- 11
1
vote
1 answer
Document AI doesn't recognize parent label area correctly, and does it only on per line basis
I have an issue with Document AI. When I try to create Parent label, with child labels in it, it does not recognize whole area of parent label correctly, and only recongizes it on per line basis, with separate label for each.
Can it be done somehow…

Solo
- 21
- 2
1
vote
1 answer
How to Process JSON response of Google Document AI OCR Api to proper structure?
I want to make proper structured txt file out of scanned pdf file in Google document ai ocr response, but I get a json response from the document. An ocr response which contains all text of file in one string and X,Y coordinates of pdf file image…

Raviprasad sathe
- 11
- 2
1
vote
1 answer
How can I grant the permission to Document AI specific processor?
It seems that in Document AI the permission can only be granted to project level. How can I grant permission to lower level, like specific processor?

rainbow
- 11
- 1
1
vote
0 answers
Google Document AI - Inconsistent Long Running Operation's metadata JSON representation
While checking the status of Document AI - Long Running Operation (Form processor), the JSON representation of decodedOperation.metadata seems to vary during the execution.
I suspect that operation response does not resolve straight away despite…

redvivi
- 83
- 8
1
vote
1 answer
Document AI OCR processor returning error 3 Unsupported input file format randomly
I am using Google Cloud Document AI for the OCR processor and am randomly running into a code 3 'Unsupported input file format.' error.
I can submit the same file 5 times and this error will come up maybe 1-2 out of the 5 times. The other times, the…

Adam B
- 95
- 11
1
vote
1 answer
Trying to get specific fields using field_mask in Google Cloud document AI API request Python
I'm having this issue because i only want specific fields from the default JSON that returns Google Cloud Document AI. The fields i want to get using the field mask are: "text" and inside "pages" i just only want tables and formFields. For text…
1
vote
1 answer
Google Document AI does not return textStyle and font information for any document
I am using Document AI services to OCR scanned and machine-generated PDF documents. I have tested with 10 different documents but none of them returned with textStyle properties (it is always empty).
Just wanted to make sure if that feature is…

Ankit A
- 11
- 4