I am using Amazon Textract to extract the Text from pdf document (refer link), its working fine.I need to get checked item of checkbox from same pdf document.How do i get the checkbox selection from pdf document.Please through some light
Asked
Active
Viewed 762 times
0

John Rotenstein
- 241,921
- 22
- 380
- 470

Ash
- 469
- 1
- 8
- 23
-
What did you try? Where did you fail? Can you expand? – misha130 Jun 03 '20 at 10:44
-
i am using amazon Textract service for extract text and checkbox from pdf, so i used to upload the pdf to s3 bucket using was credentials like(aws accesskey,secretkey,region and S3 Bucket name). after uploaded using key calling the GetDocumentTextDetectionAsync for response. As response i getting all text but no selection Element of Checked checkbox – Ash Jun 03 '20 at 13:26
1 Answers
0
Regarding detection of selection elements such as radio buttons and check boxes on a document page, these elements can be detected in form data and in tables. Refer to the following doc for the same: https://docs.aws.amazon.com/textract/latest/dg/how-it-works-selectables.html
If the style of the selection element is different from the one given in the above documentation the results returned from Textract might not include these elements, or may be inaccurate.
Further, looking into the JSON data returned from the Textract API, the KEY_VALUE_SET
BlockType objects need to be extracted from the Blocks returned to get all the checkbox values extracted from the document, as outlined here.

Paradigm
- 1,876
- 1
- 12
- 16
-
I have tried my pdf with "API for Amazon textract console" which returns JSON with Checkbox's Selection Element . Same PDF with Amazon textract SDK where i failed to get Selection Element. – Ash Jun 04 '20 at 09:43
-
If the console demo detects the selection element, the SDK call should return it as well since the backend is same for both. Do note that the `Block` objects are paginated and if a "[NextToken](https://docs.aws.amazon.com/textract/latest/dg/API_GetDocumentAnalysis.html#Textract-GetDocumentAnalysis-response-NextToken)" is returned in the response, it needs to be passed in the next [request syntax](https://docs.aws.amazon.com/textract/latest/dg/API_GetDocumentAnalysis.html#API_GetDocumentAnalysis_RequestSyntax) to fetch the remaining results. – Paradigm Jun 04 '20 at 15:12