9

I have about 250 single-page pdf files that have names like:

file_1_100.pdf,
file_1_200.pdf, 
file_1_300.pdf, 
file_2_100.pdf, 
file_2_200.pdf, 
file_2_300.pdf, 
file_3_100.pdf, 
file_3_200.pdf, 
file_3_300.pdf
...etc

I am using the following command to combine them to a single pdf file:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file*pdf

It works perfectly, combining them in the correct order. However, when I am looking at finished.pdf, I want to have a reference that tells me the orignal filename for each page.

Does anyone have any suggestions? Can I add page names referencing the files or something?

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
Stephen
  • 91
  • 1
  • 1
  • 2
  • The Python script here seems promising: http://blog.tremily.us/posts/PDF_bookmarks_with_Ghostscript/ – Geremia Mar 09 '15 at 08:47

2 Answers2

9

It is fairly easy to put the file names into a list of Bookmarks which many PDF viewers can display.

This is done with PostScript using the 'pdfmark' distiller operator. For example, use the following

gs -sDEVICE=pdfwrite -o finished.pdf control.ps

where control.ps contains PS commands to print the pages and output the bookmark (/OUT) pdfmarks:

(examples/tiger.eps) run [ /Page 1 /Title (tiger.eps) /OUT pdfmark
(examples/colorcir.ps) run [ /Page 2 /Title (colorcir.ps) /OUT pdfmark

Note that you can also perform the enumeration using PS to automate the entire process:

/PN 1 def
(file*.pdf) {
  /FN exch def
  FN run
  [ /Page PN /Title FN /OUT pdfmark % do the file and bookmark it by filename
  /PN PN 1 add def % bump the page number
} 1000 string filenameforall

NB that the order of filenameforall enumeration is not specified, so you may want to sort the list to control the order, using the Ghostscript extension .sort ( array lt .sort lt ).

Also after thinking about this, I also realized that if an imput file has more than one page, there is a better way to set the bookmark to the correct page number using the 'PageCount' device property.

[
  (file*.pdf) { dup length string copy } 1000 string filenameforall
] % create array of filenames
{ lt } .sort % sort in increasing alphabetic order
/PN 1 def
{ /FN exch def
  /PN currentpagedevice /PageCount get 1 add def % get current page count done (next is one greater)
  FN run [ /Page PN /Title FN /OUT pdfmark % do the file and bookmark it by filename
} forall

The above creates an array of strings (copying them to unique string objects since filenameforall just overwrites the string it is given), then sorts it, and finally processes the array of strings using the forall operator. By using the PageCount device property to get the count of pages already produced, the page number (PN) for the bookmark will be correct. I have tested this snippet as 'control.ps'.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
Ray Johnston
  • 613
  • 4
  • 3
  • 4
    I'm terribly sorry, but this is extremely poorly worded. Is there any chance we could get some clarification as to what `tiger.eps` or `colorcir.ps` are or what the `1000` is for? – puk Jun 19 '13 at 07:42
1

To stamp the filename on each page you can use a combination of ghostscript and pdftk. Taken from https://superuser.com/questions/171790/print-pdf-file-with-file-path-in-footer

gs \
-o outdir\footer.pdf \
-sDEVICE=pdfwrite \
-c "5 5 moveto /Helvetica findfont 9 scalefont setfont (foobar-filename.pdf) show"

pdftk \
foobar-filename.pdf \
stamp outdir\footer.pdf \
output outdir\merged_foobar-filename.pdf
Community
  • 1
  • 1
matt wilkie
  • 17,268
  • 24
  • 80
  • 115