I've been using Google Document AI for text extraction from scanned documents, and it's been working well in terms of extracting text. However, I'm facing an issue when it comes to preserving the layout of the text.
In AWS, there's a tool called "pretty print" that helps maintain the layout of extracted text. Tesseract, on the other hand, allows for preserving interword spaces using the config='-c preserve_interword_spaces=1'
option.
I'm wondering if there's a similar option or method available in Google Document AI to ensure that the layout and spacing of the extracted text are preserved? If anyone has experience or insights into this, I'd greatly appreciate your guidance.
Thanks in advance!