1

Is there some way or activity in UIPath where in we can check if a PDF file is a 1st generation Document? An idea or help would be much appreciated. Thank you.

  • What is a first generation? Initial creation without edits? – kwoxer Sep 03 '20 at 06:30
  • Its a type of document that is original. Not scanned into the computer –  Sep 03 '20 at 06:32
  • 1
    I've never heard the term "1st generation" used for this. Typically those are referred to as digital or born digital documents. And deriving from this question and the others you've posted, I think you need to read up on the PDF file format so you understand why the things you ask are exceedingly difficult. – David van Driessche Sep 03 '20 at 20:06

1 Answers1

0

This is more of a hack than a proper solution but it should work: use the digitize activity in the IntelligentOCR package with an OCR that you know returns word confidences (I think Microsoft OCR does but double check). The Digitize activity will decide if it needs an OCR or not, and if no OCR is used (meaning it's a native document or first generation how you call it) then all OCRConfidences in the DOM will be -1.

There are two caveats to doing this:

  • the digitize may decide to use OCR on a native PDF as well in certain weird edge cases if it decides the document text is unreadable (for instance due to super weird custom fonts)
  • while currently not supported, the Digitize activity may at some point in the future do partial OCRs for instance when a native PDF contains an image with text. As with any "undocumented feature", use with caution, as it may break at any time in the future when upgrading to a new version
Tudor Carean
  • 972
  • 2
  • 12
  • 22