2

I am interested in highlighting portions of a PDF programmatically, hopefully through a command line tool of sorts. My particular PDF file is not OCRed so the text is not searchable, but the particular places that I would like to highlight occur on every page in the same position. I was wondering if there is a tool to do this where I can input the rectangle positions in pixels into the command line tool and it would highlight the relevant portions for me.

Previous Findings

I have looked over the internet and found a few sites noting how to do this by searching for the text. Unfortunately that is not possible for me as my PDF does not have OCR.

I have searched stackexchange for similar questions and found How to Highlight Text in PDF with commandline (windows)? and https://stackoverflow.com/questions/32713633/how-to-highlight-text-in-pdf-using-acrobat-reader-from-command-line but both were unanswered.

Potential Ideas

The first link had a possible lead with a given link to Add comments to PDF files automagically with regular expressions which uses ghostscript to include annotations. Is it possible to use ghostscript to highlight the pages in a similar fashion by coordinates. The second link mentioned using command line options for the adobe acrobat/reader exe file, but searching the relevant manual for the command line switches does not show any highlighting options. It may be possible that Adobe does not support the highlight option through command line anymore, which would be unfortunate.

My last idea would be using AutoHotkey to create a macro that does an actual highlight for me using a GUI program, but that would be the last resort.

What do you all think? Any ideas on what to do, or things to check out? I am willing to program out a solution and can work out the solution on Windows or Linux if necessary. Thanks in advance.

Community
  • 1
  • 1
sticke4
  • 121
  • 2
  • 7
  • 1
    A possible approach could be creating an FDF which then can be imported into the base document. (as it is at brainfart level at the moment, I add it as comment instead of answer). In order to determine how the FDF has to look like, create some comments in the document, and then export those comments. – Max Wyss Dec 09 '15 at 07:59
  • Not sure this would be good enough for you, but you can use Javascript inside Adobe Acrobat (i.e. you can open a PDF file in Acrobat and feed Acrobat a Javascript file to run). The Javascript API inside of Acrobat is certainly capable of creating link annotations at a certain location. So what you would need to research is how to write the correct Javascript and how to launch Acrobat and pass it the Javascript to run. But it's a viable solution. – David van Driessche Dec 09 '15 at 09:03
  • @MaxWyss That is a interesting idea. So what you are suggesting is that I can study how a FDF file works with one page of the PDF, and then script the FDF to do the same for all the other pages? Would I be able to edit the FDF file using a text editor or would I need other sorts of software to interact with it? – sticke4 Dec 09 '15 at 17:39
  • @DavidvanDriessche I had not realized Javascript can be used in Adobe Acrobat. Looking up the documentation briefly looks like this can be quite a powerful approach. Unfortunately I am not familiar with programming with Javascript so it may be time consuming to accomplish this approach. I will definitely look into it as a longer term solution. – sticke4 Dec 09 '15 at 17:41
  • @sticke4: FDF is a structured text format. That means all you need toolwise is a good text editor (and a lot of wetware between your ears…). – Max Wyss Dec 09 '15 at 18:03
  • @sticke4: another approach, for which I have implemented an utility for a client, and which is in active use, is to start from a spreadsheet where you roughly define the annotations, including their coordinates, and use that utiltiy to feed that information into the document. It might be an idea to get in contact in private… – Max Wyss Dec 09 '15 at 18:06

2 Answers2

0

I would have thought a Highlight annotation was what you wanted.Highlight annotations are a type of text markup annotation and as such take a set of QuadPoints which describe the bounding box(es) to apply the annotation type to.

Since you say you know the co-ordinates this would seem appropriate for your use. Of course, you will have to create the Annotation on every page, and you will have to learn how to program this with a pdfmark, but I believe it should work.

Note that the co-ordinates are in user space (generally 72 points to the inch) NOT pixels, because PDF is not an image format there is no concept of pixels, except for included images.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • Looking up the documentation for the highlight annotation in pdfmark looks very promising. How would I go about finding the coordinates in user space, is there a tool for such a thing? – sticke4 Dec 09 '15 at 17:54
  • Also after some searching on pdfmark am I correct in assuming that I can write the code in an EPS file and then run it through a postscript interpreter. The official sources from Adobe mention using Distiller, but a few sources mention that I can also use Ghostscript. I presume the advantage of using ghostscript would be that it is free as opposed to Distiller which is part of the Acrobat bundle. Is that right? – sticke4 Dec 09 '15 at 18:00
  • Acrobat has a measurement tool, beyond that, you'll have to measure the page with a ruler I guess. You *must* use a PostScript interpreter for pdfmark, because its a PostScript operator, it won't work with anything else. You don't put the pdfmarks in an EPS file, just a plain PostScript program. You can't use PDF as an input with Distiller, it only accepts PostScript, so if you want to do this, and use pdfmark, your only option is to use Ghostscript. – KenS Dec 10 '15 at 08:05
0

There are quite a few officially unsupported command line parameters to acrobat or the acrobat reader (acrord32.exe in Windows).

See: https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf

This includes a parameter to highlight with four integers at left,right,top,bottom that are in some unspecified units but with 0,0 at the top left of the page.

EXCEPT... I have been unable to get this to work.

I can pass in parameters to search and zoom but highlight never shows anything.

For instance:

start acrord32 /n /s /a "search=MS25441&zoom=300&page=1&highlight=0,55,0,65" floorplan1_ABM_cameras.pdf

Opens the files, searches for the string, zooms to 300% but nothing shows for a highlight no matter what coordinates I specify.

lcbrevard
  • 263
  • 1
  • 12