Split PDF into multiple PDFs using iTextsharp

Question

public int SplitAndSave(string inputPath, string outputPath)
    {
        FileInfo file = new FileInfo(inputPath);
        string name = file.Name.Substring(0, file.Name.LastIndexOf("."));

        using (PdfReader reader = new PdfReader(inputPath))
        {

            for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber++)
            {
                string filename = pagenumber.ToString() + ".pdf";

                Document document = new Document();
                PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + filename, FileMode.Create));

                document.Open();

                copy.AddPage(copy.GetImportedPage(reader, pagenumber));

                document.Close();
            }
            return reader.NumberOfPages;
        }

    }

I want to split the Pdf in to multiple PDFs with 50 pages interval.(Suppoose If there are 400 pages PDF, I want 8 pdfs). The above code is splitting every page into a pdf. Please help me...I'm using asp.net with iTextSharp.

Hint: If you only want a new document every 50 pages, why do you create a new document during *every single loop iteration*? — Heinzi, May 04 '15 at 06:57

score 15 · Answer 1 · edited Apr 06 '16 at 14:53

You're looping through the pdf and creating a new document every time you advance a page. You'll need to keep track of your pages so that you perform split only every 50 pages. Personally I would put that in a separate method and call it from your loop. Something like this:

private void ExtractPages(string sourcePDFpath, string outputPDFpath, int startpage,  int endpage)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    reader = new PdfReader(sourcePDFpath);
    sourceDocument = new Document(reader.GetPageSizeWithRotation(startpage));
    pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPDFpath, System.IO.FileMode.Create));

    sourceDocument.Open();

    for (int i = startpage; i <= endpage; i++)
    {
        importedPage = pdfCopyProvider.GetImportedPage(reader, i);
        pdfCopyProvider.AddPage(importedPage);
    }
    sourceDocument.Close();
    reader.Close();
}

So in your original code loop through your pdf and every 50 pages call the above method. You'll just need to add variables in your block to keep track of the start/end pages.

score 4 · Answer 2 · answered Sep 12 '13 at 10:45

4

this will be of use. very much matches your requirement

http://www.codeproject.com/Articles/559380/SplittingplusandplusMergingplusPdfplusFilesplusinp

answered Sep 12 '13 at 10:45

RohitWagh

1,999
3
22
43

1

I've used the above codeproject code and I'm getting error :"Access to the path denied"; – Billy Sep 12 '13 at 11:22
this means u do not have rights to write to the folder you are writing the pdf at. – RohitWagh Sep 12 '13 at 11:31

score 2 · Answer 3 · answered Jan 07 '19 at 13:56

2

Here is a shorter solution. Haven't tested which method has the better performance.

private void ExtractPages(string sourcePDFpath, string outputPDFpath, int startpage, int endpage)
{
  var pdfReader = new PdfReader(sourcePDFpath);
  try
  {
    pdfReader.SelectPages($"{startpage}-{endpage}");
    using (var fs = new FileStream(outputPDFpath, FileMode.Create, FileAccess.Write))
    {
      PdfStamper stamper = null;
      try
      {
        stamper = new PdfStamper(pdfReader, fs);
      }
      finally
      {
        stamper?.Close();
      }
    }
  }
  finally
  {
    pdfReader.Close();
  }
}

answered Jan 07 '19 at 13:56

MovGP0

7,267
3
49
42

1

The most relevant advantage of your solution is that it keeps document level data (meta data, document level attachments,...), it being shorter merely is a nice side effect. – mkl Jan 07 '19 at 22:32
@mkl I have found PDFCopy to be better at keeping XmpMetadata and everything else intact. – blaze_125 Feb 20 '21 at 03:35
`PdfCopy` better than `PdfStamper`? That sounds implausible. Only possible in cases where `PdfCopy` by chance repairs some issue while `PdfStamper` keeps the issue as is. Unless i overlooked something... ;) – mkl Feb 20 '21 at 08:04

score 0 · Answer 4 · answered Jun 03 '19 at 11:02

I faced the same problem but wanted to use iText7 for .NET. In this concrete case, this code worked for me:

1st: Implement own PdfSplitter

 public class MyPdfSplitter : PdfSplitter
 {
    private readonly string _destFolder;
    private int _pageNumber;
    public MyPdfSplitter(PdfDocument pdfDocument, string destFolder) : base(pdfDocument)
    {
        _destFolder = destFolder;
    }

    protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
    {
        _pageNumber++;
        return new PdfWriter(Path.Combine(_destFolder, $"p{_pageNumber}.pdf"));
    }
}

2nd: Use it to split your PDF

using (var pdfDoc = new PdfDocument(new PdfReader(filePath)))
{
    var splitDocuments = new MyPdfSplitter(pdfDoc, targetFolder).SplitByPageCount(1);
    foreach (var splitDocument in splitDocuments)
    {
        splitDocument.Close();
    }
 }

Code migrated from Java example: https://itextpdf.com/en/resources/examples/itext-7/splitting-pdf-file

Hope this helps to others!

If the MyPdfSplitter creates a PdfWriter(new MemoryStream), do you know how to get the stream contents? — M Akin, Aug 01 '23 at 21:14
In Order to get the memory streams I created a property on the CustomSplitter to hold the list of MemoryStreams, so that I could access them later. — M Akin, Aug 01 '23 at 22:14

Split PDF into multiple PDFs using iTextsharp

4 Answers4

Linked