I have a project that used to split a pdf file that uploaded by user, after split then get the same content inside pdf then merge the page base on pdf content using PDODocument and for merge pdf i use PDFMergerUtility, after marge i save the merge pdf to database using bytearray. and, after save to DB, user also can download the pdf that already split and merge base on content and reupload when needed.
but i have found a problem, after merge the size of pdf is bigger than pdf before split.
i have try to found the solution, but not found that working to my problem, such us
Is there a way to compress PDF to small size using Java?
and another else solution
is there any solution to solve my problem? I would be glad for any help.
and here is my code
//file: MultipartFile -> file is send from front-end using API
var inpStream: InputStream = file.getInputStream()
inpStream = file.getInputStream()
pdfDocument = PDDocument.load(inpStream)
// splitting the pages of a PDF document
pagesPdf = splitter.split(pdfDocument)
val n = pdfDocument.numberOfPages
val batchSize:Int = 200
val finalBatchSize: Int = n % batchSize
val numOfBatch: Int = (n - finalBatchSize) / batchSize
val batchFinal: Int = if (finalBatchSize == 0) numOfBatch else (numOfBatch + 1)
var batchNo: Int = 1
var startPage: Int
var endPage: Int = 0
while (batchNo <= batchFinal) {
startPage = endPage + 1
if (batchNo > numOfBatch) {
endPage = endPage + finalBatchSize
} else {
endPage = endPage + batchSize
}
val splitter:Splitter = Splitter()
splitter.setStartPage(startPage)
splitter.setEndPage(endPage)
// splitting the pages of a PDF document
pagesPdf = splitter.split(pdfDocument)
batchNo++
i = startPage
var groupPage: Int = i
var pageNo = 0
var pdfMerger: PDFMergerUtility = PDFMergerUtility()
var mergedFileByteArrOut: ByteArrayOutputStream = ByteArrayOutputStream()
pdfMerger.setDestinationStream(mergedFileByteArrOut)
var fileObj:ByteArray? = null,
for (pd in pagesPdf) {
pageNo++;
if (!pd.isEncrypted) {
val stripper = PDFTextStripper()
//CODE TO GET CONTEN
if(condition1 == true){
var fileByteArrOut: ByteArrayOutputStream = ByteArrayOutputStream()
pd.save(fileByteArrOut)
pd.close()
var fileByteArrIn: ByteArrayInputStream = ByteArrayInputStream(fileByteArrOut.toByteArray())
pdfMerger.addSource(fileByteArrIn)
fileObj = fileByteArrOut.toByteArray(),
}
if(condition2 == true){
//I want to compress fileObj first before save to DB
//code to save to DB
fileObj = null
pdfMerger = PDFMergerUtility()
mergedFileByteArrOut= ByteArrayOutputStream()
pdfMerger.setDestinationStream(mergedFileByteArrOut)
}
}
}