0

I have a project that used to split a pdf file that uploaded by user, after split then get the same content inside pdf then merge the page base on pdf content using PDODocument and for merge pdf i use PDFMergerUtility, after marge i save the merge pdf to database using bytearray. and, after save to DB, user also can download the pdf that already split and merge base on content and reupload when needed.

but i have found a problem, after merge the size of pdf is bigger than pdf before split.

i have try to found the solution, but not found that working to my problem, such us

Android PdfDocument file size

Is there a way to compress PDF to small size using Java?

and another else solution

is there any solution to solve my problem? I would be glad for any help.

and here is my code

//file: MultipartFile -> file is send from front-end using API

var inpStream: InputStream = file.getInputStream()
inpStream = file.getInputStream()
pdfDocument = PDDocument.load(inpStream)


// splitting the pages of a PDF document
pagesPdf = splitter.split(pdfDocument)
val n = pdfDocument.numberOfPages

val batchSize:Int = 200
val finalBatchSize: Int = n % batchSize
val numOfBatch: Int = (n - finalBatchSize) / batchSize
val batchFinal: Int = if (finalBatchSize == 0) numOfBatch else (numOfBatch + 1)
var batchNo: Int = 1
var startPage: Int
var endPage: Int = 0
while (batchNo <= batchFinal) {
    startPage = endPage + 1
    if (batchNo > numOfBatch) {
        endPage = endPage + finalBatchSize
    } else {
        endPage = endPage + batchSize
    }
    val splitter:Splitter = Splitter()
    splitter.setStartPage(startPage)
    splitter.setEndPage(endPage)

    // splitting the pages of a PDF document
    pagesPdf = splitter.split(pdfDocument)

    batchNo++
    i = startPage
    var groupPage: Int = i
    var pageNo = 0
    
    
    var pdfMerger: PDFMergerUtility = PDFMergerUtility()
        var mergedFileByteArrOut: ByteArrayOutputStream = ByteArrayOutputStream()
        pdfMerger.setDestinationStream(mergedFileByteArrOut)
    var fileObj:ByteArray? = null,
    for (pd in pagesPdf) {
        pageNo++;
        if (!pd.isEncrypted) {
        val stripper = PDFTextStripper()
        //CODE TO GET CONTEN
        
        if(condition1 == true){
          var fileByteArrOut: ByteArrayOutputStream = ByteArrayOutputStream()
              pd.save(fileByteArrOut)
              pd.close()
              var fileByteArrIn: ByteArrayInputStream = ByteArrayInputStream(fileByteArrOut.toByteArray())
              pdfMerger.addSource(fileByteArrIn)
          fileObj = fileByteArrOut.toByteArray(),
        } 
        if(condition2 == true){
        
            //I want to compress fileObj  first before save to DB
            //code to save to DB
            
            fileObj = null
            pdfMerger = PDFMergerUtility()
                      mergedFileByteArrOut= ByteArrayOutputStream()
                      pdfMerger.setDestinationStream(mergedFileByteArrOut)
        }
      }
    }
Hansen
  • 650
  • 1
  • 11
  • 32

1 Answers1

0

You can use cpdf https://community.coherentpdf.com to losslessly squeeze the PDF files afterward. This will reconcile any identical object and common parts, and remove any unneeded parts.

From the command line

cpdf -squeeze in.pdf -o out.pdf

Or, from Java:

jcpdf.squeezeInMemory(pdf);
johnwhitington
  • 2,308
  • 1
  • 16
  • 18