Split PDF using offset and length- IBM ONDEMAND (combined PDF)

Question

Need to get the offset and byte length of each page in a PDF.For Example first page offset value will be 0 and length will be the byte length of the page.

I have a requirement to pass index file for a PDF to IBM Ondemand tool, it is a PDF repository. I need to merge individual PDF files calculate the offset and length of each PDF, create an index file with these two parameters and pass it to the tool.

The tool will make use of the index file to split the PDF(multiple PDF combined into single) based on the offset and length passed as properties (Index file).

I used itext to get the beginning and end of the page using bookmark. Need to calculate the offset and length of the bytes for each page.

Suggest is there is any way to get the index(start of the page) and end of the page in terms of bytes.

Any help would be appreciated

Is it me, or is this question just non sense? Can you please rephrase it? Seems like you are making some assumptions about the PDF file format that are wrong. — Bruno Lowagie, Mar 21 '16 at 15:36

score 1 · Answer 1 · answered Mar 21 '16 at 15:50

You cannot do this in any way. Please read the PDF file format specification (here amongst other places http://www.adobe.com/devnet/pdf/pdf_reference.html).

A PDF file contains "objects" and a page has both a page description recorded in a stream object and can (and mostly will) use various other objects that in all likelihood are scattered around the file.

You misunderstand how PDF files are built and you need to understand before you start stumbling around in trying to implement this or you're going to waste a lot of time.

Magesh · Answer 2 · 2016-03-21T19:52:37.010

This question need to be asked on IBM Ondemand forum. I thought i can make use of Itext to crack it down. As mentioned by David we cannot deal with these kind of unstructured PDF by using Itext. Below given the code snippet to solve the problem.

Both the PDF are merged using plain java. The merged file will have two EOF,header and trailer information.

When you open in Acrobat it will read the last document information and display. When we pass the length and index to ondemand it will split the PDF and display as expected.

public static void main(String[] args) throws IOException {
    String sourceFile1Path = "C:\\sample1.PDF";
    String sourceFile2Path = "C:\\sample1.PDF";

    String mergedFilePath = "C:\\merged.PDF";

    File[] files = new File[2];
    files[0] = new File(sourceFile1Path);
    files[1] = new File(sourceFile2Path);

    File mergedFile = new File(mergedFilePath);
    for (File file : files) {
        FileWriter fstream = null;
        BufferedWriter out = null;
        fstream = new FileWriter(mergedFile, true);
        out = new BufferedWriter(fstream);

        FileInputStream fis = new FileInputStream(file);
        BufferedReader in = new BufferedReader(new InputStreamReader(fis));

        String aLine;
        while ((aLine = in.readLine()) != null) {
            out.write(aLine);
            out.newLine();
        }
        out.close();
        fstream.close();
        fis.close();
        in.close();

        System.out.println("File Length: " + file.getName() + " : " + new File(mergedFilePath).length());
    }
}

Split PDF using offset and length- IBM ONDEMAND (combined PDF)

2 Answers2