0

I would like to extract handwritten text from a scanned image - using say Amazon AWS Textract. And then would like to be able to create a searchable PDF with the output - so convert the image into a pdf with a text layer.

Amazon has provided a blog post and java code showing how it can be done.

Would like to be able to do it in Python. Python code examples showing AWS Textract usage are all here - link.

However, these examples do not show how to use the response from AWS Textract and create a searchable PDF. Has anybody written code for that last step - to create searchable PDF with Textract response?

Thank you.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
jim70
  • 515
  • 1
  • 5
  • 18
  • Creating a PDF from text that you extracted from an image is not something that AWS Textract or other AWS services can do for you. Use the typical Python libraries to do this, for example [PyPDF2.PdfFileWriter](https://pythonhosted.org/PyPDF2/PdfFileWriter.html). – jarmod Feb 17 '21 at 03:36
  • @jarmod ok - got it. Will work on figuring out how to use PdfFileWriter when I next go down this path. Thank you! – jim70 Mar 14 '21 at 01:01

1 Answers1

1

Here is an aws-sample repo using Python to create searchable pdfs similar to the Java link you posted.

tbrk
  • 173
  • 1
  • 8
  • thank you for sharing. will try to come to this when I take it on again. for now, on back burner :) – jim70 Apr 20 '23 at 13:53