could you possibly help me: I have a pdf in Hebrew with numerated paragraphs inside. After processing this pdf with Google Document AI OCR API, I receive text, where paragraph numbering always goes before actual text:this is an example of paragraphs numeration before paragraphs text Is it possible to solve this problem?
I tried examining lines and tokens layout of the json, returned by Document AI, but the layout reflects the problem, the numbers are not in the correct place
`# documents - output of the Documents API
for document in documents:
for page in document.pages:
for line in page.lines:
if page.page_number <=10:
layout = line.layout
text_anchor = layout.text_anchor
start_index = text_anchor.text_segments[0].start_index
end_index = text_anchor.text_segments[0].end_index
line_text = document.text[start_index:end_index]
print(line_text)
`
I was previously trying Google Vision AI and have also tried different documents, and all the time there was the same error.
Thank you!