Protecting PDF's

Question

I am currently using the Apache FOP library to generate PDF's. I want these PDF's protected from copy-pasting, so people would have to use actual OCR libraries (or manual typing) to get the information on the PDF.

FOP apparently offers some security, which then is added as meta-data on the PDF, to protect from things like printing or copying, but this doesn't seem to work properly (can't disable the copy-pasting when printing is enabled, etc).

A possibility which seemed straight forward to me is basically somehow transforming all the text on the PDF's to images, but I can't find any information on the matter.

Obviously I don't care if the PDF is searchable or not. I just want to prevent people from copy-pasting while they should still be able to print it.

My current FOP code:

private static FopFactory fopFactory;

private static FopFactory initializeFactory() throws IOException,
        SAXException {
    if (fopFactory == null) {
        File f = new File(SettingUtil.getSetting(LetterGeneratorSettings.FOP_CONFIG_LOCATION));
        fopFactory = FopFactory.newInstance(f);
    }
    return fopFactory;
}

public static File generatePDFFromXML(File fopTemplate, File xmlSource,
        File resultFileLocation) throws IOException {
    try {
        initializeFactory();
        URL url = fopTemplate.toURI().toURL();
        // creation of transform source
        StreamSource transformSource = new StreamSource(url.openStream());
        // create an instance of fop factory

        // a user agent is needed for transformation
        FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
        foUserAgent.getRendererOptions().put("encryption-params",
                getEncryptionParams());
        // to store output
        ByteArrayOutputStream pdfoutStream = new ByteArrayOutputStream();
        StreamSource source = new StreamSource(new ByteArrayInputStream(IOUtils.toByteArray(new FileInputStream(xmlSource))));
        Transformer xslfoTransformer;
        try {
            TransformerFactory transfact = TransformerFactory.newInstance();

            xslfoTransformer = transfact.newTransformer(transformSource);
            // Construct fop with desired output format
            Fop fop;
            try {
                fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, pdfoutStream);
                // Resulting SAX events (the generated FO)
                // must be piped through to FOP
                Result res = new SAXResult(fop.getDefaultHandler());

                // Start XSLT transformation and FOP processing
                try {
                    // everything will happen here..
                    xslfoTransformer.transform(source, res);

                    // if you want to save PDF file use the following code
                    OutputStream out = new java.io.FileOutputStream(resultFileLocation);
                    out = new java.io.BufferedOutputStream(out);
                    FileOutputStream str = new FileOutputStream(resultFileLocation);
                    str.write(pdfoutStream.toByteArray());
                    str.close();
                    out.close();

                } catch (TransformerException e) {
                    e.printStackTrace();
                }
            } catch (FOPException e) {
                e.printStackTrace();
            }
        } catch (TransformerConfigurationException e) {
            e.printStackTrace();
        } catch (TransformerFactoryConfigurationError e) {
            e.printStackTrace();
        }
        return resultFileLocation;
    } catch (Exception ex) {
        throw new IOException(ex);
    }
}

private static PDFEncryptionParams getEncryptionParams() {
    return new PDFEncryptionParams(null,
            SettingUtil.getSetting(LetterGeneratorSettings.PDF_PASSWORD),
            true, false, false, false, false);
}

The following is the contents of my fopconfig.xml

    <fop version="1.0">

  <!-- Strict user configuration -->
  <strict-configuration>false</strict-configuration>

  <!-- Strict FO validation -->
  <strict-validation>false</strict-validation>

  <!-- Base URL for resolving relative URLs -->
  <base>./</base>

  <!-- Font Base URL for resolving relative font URLs -->
  <font-base>./</font-base>

  <!-- Source resolution in dpi (dots/pixels per inch) for determining the size of pixels in SVG and bitmap images, default: 72dpi -->
  <source-resolution>72</source-resolution>
  <!-- Target resolution in dpi (dots/pixels per inch) for specifying the target resolution for generated bitmaps, default: 72dpi -->
  <target-resolution>72</target-resolution>

  <!-- default page-height and page-width, in case
       value is specified as auto -->
  <default-page-settings height="11in" width="8.26in"/>

  <!-- etc. etc..... -->
</fop>

What is the version you are using? And do you have any fop.conf file where you are overriding any settings as you are passing a setting file while instantianing the FopFactory? — Aninda Bhattacharyya, Dec 21 '15 at 22:26
Does the transformation to image content have to happen during fop transformation or is a postprocessing step e.g. with apache pdfbox ok, too? — mkl, Dec 22 '15 at 17:01
You're adding fopconfig.xml. @Aninda was probably talking about [fop.xconf](https://xmlgraphics.apache.org/fop/2.0/pdfencryption.html) which is where you set up encryption. — approxiblue, Dec 22 '15 at 18:07
Having it happen post-transformation is a possibility for me, mkl, but I'd prefer it to happen in a single step. I'm not adding any fop.xconf file, I believe, approxiblue. — Kristof, Dec 22 '15 at 21:26
I think your requirements contradict each other - if you allow printing, they can print into any of many available PDF drivers, and thereby generate a secondary PDF which *they* control access to - i.e., they can copy and paste from it. Probably the only way is to generate images from the PDF and just insert the images in the file (enforcing OCR use for any further access) — Aganju, Dec 23 '15 at 13:18
Why not add a fop.xconf file? It does what you want (prevent copy-pasting), in a much more straightforward way than transforming text to images. — approxiblue, Dec 23 '15 at 16:29

score 0 · Answer 1 · answered Dec 28 '15 at 00:37

0

I am not sure how it works with Apache FOP but it is quite easy with iText lib.

Here a tutorial i wrote a while back ago about this http://tutors4all.net/index.php/2015/05/06/encrypt-pdf-file/

answered Dec 28 '15 at 00:37

George Moralis

518
2
12

Protecting PDF's

1 Answers1