I am passing scanned PDFs into the Google Cloud Document AI OCR. The JSON response (or the Document object returned when using the Python API) contains the content of the PDF in a structured format, as described here. I would like to be able to output a PDF file as well (or XML if that's easier). Is there such a functionality? Any hints on possible implementations are appreciated.
Note: the PDFs are already OCRed by another tool prior to my tasks, but the quality is not as good as the Document AI OCR.
Thank you