I have the following code which utilises boto3 for AWS.
import boto3
from trp import Document
# Document
s3BucketName = "bucket"
documentName = "doc.png"
# Amazon Textract client
textract = boto3.client('textract')
# Call Amazon Textract
response = textract.analyze_document(
Document={
'S3Object': {
'Bucket': s3BucketName,
'Name': documentName
}
},
FeatureTypes=["FORMS"])
#print(response)
doc = Document(response)
for page in doc.pages:
# Print fields
print("Fields:")
for field in page.form.fields:
print("Key: {}, Value: {}".format(field.key, field.value))
I am trying to save the output of that function as dict, JSON, or CSV, but I am not an experienced python programmer yet.
I tried this:
key_map = {}
filepath = 'output.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
for page in doc.pages:
# Print fields
print("Fields:")
for field in page.form.fields:
#print("Key: {}, Value: {}".format(field.key, field.value))
key_map[str(field.key, field.value)] = cnt
line = fp.readline()
cnt +=1
But I don't think that this solution is working. Any tips on how to save the output of that for loop as a JSON?