-1

I am using itext library to merge multiple pdfs, I am able to merge multiple pdfs but if pdf contain scan pages then i don't want to add it in merged PDF, Does it possible to check scan pages using itext?.

I am using Following code to merge pdf.

Document PDFJoinInJava = new Document();
PdfCopy PDFCombiner = new PdfCopy(PDFJoinInJava, outputStream);
PdfCopy.PageStamp stamp;
PDFJoinInJava.open();
PdfReader ReadInputPDF;

List<InputStream> pdfs = streamOfPDFFiles;
List<PdfReader> readers = new ArrayList<PdfReader>();
int totalPages = 0;
Iterator<InputStream> iteratorPDFs = pdfs.iterator();
for (; iteratorPDFs.hasNext(); pdfCounter++) {
    InputStream pdf = iteratorPDFs.next();
    PdfReader pdfReader = new PdfReader(pdf);
    readers.add(pdfReader);
    totalPages += pdfReader.getNumberOfPages();
    pdf.close();
}
int number_of_pages;
int currentPageNumber = 0;
int pageOfCurrentReaderPDF = 0;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();

PdfImportedPage page;
// Loop through the PDF files and add to the output.
int count = 1;

while (iteratorPDFReader.hasNext()) {
    PdfReader pdfReader = iteratorPDFReader.next();
    count++;
    number_of_pages = pdfReader.getNumberOfPages();

    // Create a new page in the target for each source page.
    for (int pageNum = 0; pageNum < number_of_pages;) {
        currentPageNumber++;
        pageOfCurrentReaderPDF++;
        page = PDFCombiner.getImportedPage(pdfReader, ++pageNum);
        ColumnText.showTextAligned(stamp.getUnderContent(),
                        Element.ALIGN_RIGHT, new Phrase(String
                                .format("%d", currentPageNumber),new Font(FontFamily.TIMES_ROMAN,3)),
                        50, 50, 0);
            stamp.alterContents();

        PDFCombiner.addPage(page);
    }
}
PDFJoinInJava.close();
Butani Vijay
  • 4,181
  • 2
  • 29
  • 61
  • What are the criteria for recognizing scanned pages in your PDF document pool? – mkl Feb 21 '14 at 11:55
  • I have multiple pdfs some of contain scanned pages, that i don't want in merged pdf. – Butani Vijay Feb 21 '14 at 11:59
  • 1
    *some of contain scanned pages* - and how do they differ exactly? Technically? One way would be to look for pages with large graphics... but page filling graphics may also exist in other documents. Or to look for pages with no or only minimal text... but if those scanned pages were OCR'ed, they do contain text. Thus, please describe how these scanned pages are technically different from non-scanned ones. – mkl Feb 21 '14 at 13:09
  • Suppose i have one pdf file contain 2 scanned pages, How can i identify those 2 pages through itext library? – Butani Vijay Feb 22 '14 at 04:24
  • It depends very much on how these scanned pages are created and what kinds of other pages they have to be differentiated from. – mkl Feb 22 '14 at 14:39

1 Answers1

-1

If you like to find whether pdf file is generated by iText or not then you have to try following code:

File file = new File("/Demo.pdf");
    Scanner input = new Scanner(new FileReader(file));
    while (input.hasNextLine()) {
        final String checkline = input.nextLine();
        if(checkline.contains("Producer(iText")) { 
            // a match found!!!!!!
            System.out.println(file.getName()+" is generated by iText........ :):) ");
        break;
        }
    }
Sheel
  • 1,010
  • 1
  • 17
  • 30