0

I have the following code which utilises boto3 for AWS.

import boto3
from trp import Document

# Document
s3BucketName = "bucket"
documentName = "doc.png"

# Amazon Textract client
textract = boto3.client('textract')

# Call Amazon Textract
response = textract.analyze_document(
    Document={
        'S3Object': {
            'Bucket': s3BucketName,
            'Name': documentName
        }
    },
    FeatureTypes=["FORMS"])

#print(response)

doc = Document(response)

for page in doc.pages:
    # Print fields
    print("Fields:")
    for field in page.form.fields:
        print("Key: {}, Value: {}".format(field.key, field.value))

I am trying to save the output of that function as dict, JSON, or CSV, but I am not an experienced python programmer yet.

I tried this:

key_map = {}
filepath = 'output.txt'
with open(filepath) as fp:
    line = fp.readline()
    cnt = 1
    while line:
        for page in doc.pages:
            # Print fields
            print("Fields:")
            for field in page.form.fields:
                #print("Key: {}, Value: {}".format(field.key, field.value))
                key_map[str(field.key, field.value)] = cnt
                line = fp.readline()
                cnt +=1

But I don't think that this solution is working. Any tips on how to save the output of that for loop as a JSON?

Greation
  • 73
  • 1
  • 2
  • 12
  • What you have tried seem to be reading from the file and not writing. So all you want is writing to a file the output of `doc = Document(response)` to a file? – Nagaraj Tantri Oct 06 '19 at 00:56
  • Yes. So, having that print output (print("Key: {}, Value: {}".format(field.key, field.value)) saved as a JSON or a CSV. – Greation Oct 06 '19 at 01:06

1 Answers1

0

If you want as a csv output, you can use csv module as:

import csv

doc = Document(response)

with open('aws_doc.csv', mode='w') as aws_field_file:
    field_write = csv.writer(aws_field_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    for page in doc.pages:
        for field in page.form.fields:
            # This will write it as your <key>, <value>
            field_write.writerow([field.key, field.value])

In case you want headers in the file you can also use the DictWriter which would make it easy for you to just pass a dictionary: https://docs.python.org/3.4/library/csv.html#csv.DictWriter

Nagaraj Tantri
  • 5,172
  • 12
  • 54
  • 78