9

Is there a way to track when a PDF is opened? Perhaps by embedding some script into the pdf itself?

I saw the question below, and I suppose the answer is "no" for javascript, but I am wondering if this is possible at all.

Google analytics tracking code insert in pdf file

Community
  • 1
  • 1
speedplane
  • 15,673
  • 16
  • 86
  • 138

3 Answers3

17

The PDF standard includes support for JavaScript but as @Wes Hardaker pointed out, not every PDF reader supports it. However, sometimes some is better than none.

Here's Adobe's official Acrobat JavaScript Scripting Guide. What's probably most interesting to you is the doc object which has a method called getURL(). To use it you'd just call:

app.doc.getURL('http://www.google.com/');

Bind that event to the document's open event and you've got a tracker. I'm not too familiar with creating events from within Adobe Acrobat but from code its pretty easy. The code below is a full working VS2010 C# WinForms app that uses the open source library iTextSharp (5.1.1.0). It creates a PDF and adds the JavaScript to the open event.

Some notes: Adobe Acrobat and Reader will both warn the user whenever a document accesses an external resource. Most other PDF readers will probably do the same. This is very annoying so for this reason alone it shouldn't be done. Personally I don't care if someone tracks my document opens, I just don't want to get a prompt every time. Second, just to reiterate, this code works for Adobe Acrobat and Adobe Reader, probably as far back as at least V6, but may or may not work in other PDF readers. Third, there's no safe way to uniquely identify the user. Doing so would require you to create and store some equivalent of a "cookie" which would require you writing to the user's file system which would be considered unsafe. This means that you could only detect the number of opens, not unique opens. Fourth, this might not be legal everywhere. Some jurisdictions require that you notify users if you are tracking them and provide for a way for them to see what information you are collecting.

But with all of the above, I can't not give an answer just because I don't like it.

using System;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //File that we will create
            string OutputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Events.pdf");

            //Standard PDF creation setup
            using (FileStream fs = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document(PageSize.LETTER))
                {
                    using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
                    {
                        //Open our document for writing
                        doc.Open();

                        //Create an action that points to the built-in app.doc object and calls the getURL method on it
                        PdfAction act = PdfAction.JavaScript("app.doc.getURL('http://www.google.com/');", writer);

                        //Set that action as the documents open action
                        writer.SetOpenAction(act);

                        //We need to add some content to this PDF to be valid
                        doc.Add(new Paragraph("Hello"));

                        //Close the document
                        doc.Close();
                    }
                }
            }

            this.Close();
        }
    }
}
Chris Haas
  • 53,986
  • 12
  • 141
  • 274
2

The problem with technologies like that is that they can never be absolute.

First, it's a security violation to trigger an external event and the software writers likely wouldn't support it (or, at least I hope not).

Second, its dependent on things like the network. What happens when someone downloads it and then reads it while offline on a plane, for example? You won't get the notification.

Third, there are multiple ways to read PDF files. Some people read them with readers you've likely not heard of (my favorite is a linux application that I like much better than the Adobe's AcroRead).

So even if you could do it (and I'd argue you shouldn't, but that's not answering your question), the real answer is "no" but even if the software supported it, it still wouldn't be reliable in the first place.

Wes Hardaker
  • 21,735
  • 2
  • 38
  • 69
  • 1
    Obviously there are privacy and reliability concerns with any type of tracking. I'm not debating that. But why do you say "the real answer is 'no.'" Don't the newer pdfs have dynamic content? I think they have some sort of scripting capabilities which may support something like this. – speedplane Nov 11 '11 at 23:08
  • 1
    Not all readers support all possible content of PDF files. As Wes pointed out, just because you can do something with Acrobat doesn't mean it'll work in Foxit, Ghostscript, MuPDF, etc, etc. – KenS Nov 12 '11 at 09:55
  • And the "no" is because, though I admit I'm not a true PDF or PS expert, the current support of the PDF language does *not* provide support for the language to query external entities (ie, you can't say "go grab this pixel image from this remote website so I can track you). PDFs are, by designed, supposed to be self-contained. – Wes Hardaker Nov 14 '11 at 17:33
0

Given that PostScript is a fully capable programming language, there shouldn't be any reason that it should not be possible to track when it is viewed/run.

I should think the difficult part in that would be finding the libraries (or making the functions yourself) to do the networking portion of the logging.

One quick note, however, on functionality like this it is probably best if you make things still-accessible on failure; the reason being people tend to get upset when their media suddenly becomes unavailable which is exactly what would happen if you forced termination on failure. (Can you guarantee that your logging-domain will never change? That it will always be available? What happens in the case where the internet is not available in the user's situation?)

Shark8
  • 4,095
  • 1
  • 17
  • 31
  • The question is about PDF and your answer is about PostScript. Does PostScript actually run within a PDF? Do you have any source with more information about their relation? – Sjoerd Nov 20 '17 at 15:58
  • 1
    (a) The title explicitly mentions PostScript, and (b) the relation between PostScript and PDF is essentially that PDF is [the result of] PS run through its processing/program; this thread on stackexchange is really informative: https://tex.stackexchange.com/questions/217511/why-do-people-still-use-postscript – Shark8 Nov 21 '17 at 01:44