0

What are the best practices for extracting data from multi-sided documents?

I need to extract data from both sides of ID Cards. My current approach is to use two separate custom-trained processors, one for the front side and another for the back side of the ID card.

However, I'm unsure if this is the most efficient or cost-effective way of handling this scenario. Would it be better to somehow combine the processing for both sides into a single processor, or is it a good practice to use separate processors for each side of a multi-sided document?

Any insights, recommendations, or experiences you can share?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Daniel Klimek
  • 77
  • 1
  • 7

1 Answers1

1

You should be able to handle both sides of an ID card with the same Custom processor, each side would be a separate page in the input document.

There are pretrained processors for US Driver Licenses and Passports listed in the documentation, you can request access using this form. But if you're using an ID for a different country, then you should create a Custom Document Extractor processor.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21
  • Thanks for the answer. However, I'm not sure how I should handle two pages. The thing is that if I select 'address - street' as a required field that should occur exactly once, the dataset won't be valid, as this field is only on the back side. Should I merge these images into one or is there another way to link these two images as a part of one document in the Document AI Workbench? – Daniel Klimek Jun 03 '23 at 09:32
  • 1
    Ok, I understand, you can create two processors (one for each side) if you have both sides as separate images and you want to keep them separate. However, it would likely make more sense for organization and cost reasons to combine the front and back images into a single file such as a PDF with each image as one page, then have one processor that can handle the entire ID Document. – Holt Skinner Jun 12 '23 at 15:34