Questions tagged [pdf]

Portable Document Format (PDF) is an open standard for electronic document exchange maintained by the International Organization for Standardization (ISO). Questions can be about creating, reading, editing PDFs using different languages.

The official ISO Specification (ISO 32000-1, a.k.a. 'PDF-1.7') is important as a reference, but it is not exactly written for PDF beginners.

Beginners may start with these two easy-to-read resources:

Related Tags

, , , , , , , , , , , , , , , , , , , ,

Questions

Related questions on Stack Overflow generally fall into the following domains:

  • How to convert, produce, or encode a PDF with , , etc.?
  • Everything else.

The first domain has been covered in depth, and any question you have is likely already answered.

Information Extraction

Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images.

PDF files generally contain drawing instructions. There's no such thing as "a table" in most PDF files. There are lines, glyphs, and raster images (and clipping, and color spaces, and so forth). It is all but impossible to determine what is or isn't a table in an arbitrary PDF file.

Note that a glyph is not a character. A glyph has an appearance; whereas, a character has meaning. Each font in a PDF may or may not map its glyphs to characters.

If at all possible, use the source data to extract information, rather than relying on the PDF. This file format is designed for visual consistency, and very little useful normalized data can be extracted from its contents.

Content

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

  • text stored as content streams (i.e. not text)
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

Related Links

For additional information about this file format see:

50972 questions
126
votes
6 answers

Show a PDF files in users browser via PHP/Perl

I want to show my users PDF files. The reason why I use CGI to show the PDF is I want to track the clicks for the PDF, and cloak the real location of the saved PDF. I've been searching on the Internet and only found how to show save dialog to the…
dimassony
  • 1,287
  • 2
  • 9
  • 6
124
votes
4 answers

Merging png images into one pdf file

How can I merge several .png files into one PDF file in Unix?
twidizle
  • 1,495
  • 2
  • 13
  • 16
124
votes
20 answers

IPython/Jupyter Problems saving notebook as PDF

So, I've been trying to save a jupyter notebook as PDF but I just can't figure out how to do this. The first thing I try is from the file menu just download as PDF, but doing that results in: nbconvert failed: PDF creating failed the next thing I…
Isak Baizley
  • 1,722
  • 4
  • 16
  • 21
120
votes
10 answers

Print PDF directly from JavaScript

I am building a list of PDFs in HTML. In the list I'd like to include a download link and a print button/link. Is there some way to directly open the Print dialog for the PDF without the user seeing the PDF or opening a PDF viewer? Some variation of…
Craig Celeste
  • 12,207
  • 10
  • 42
  • 49
119
votes
8 answers

Zoom to fit: PDF Embedded in HTML

I am embedding a local pdf file into a simple webpage and I am looking to set the initial zoom to fit to the object size. Here is what I tried but it is not affecting the zoom. does…
user3024833
  • 1,191
  • 2
  • 8
  • 3
118
votes
4 answers

Convert PDF to PNG using ImageMagick

using ImageMagick, what command should i use to convert a PDF to PNG? I need highest quality, smallest file size. this is what I have so far (very slow by the way): convert -density 300 -depth 8 -quality 85 a.pdf a.png Looking at what Gmail does…
StackOverflowNewbie
  • 39,403
  • 111
  • 277
  • 441
115
votes
24 answers

Extract images from PDF without resampling, in python?

How might one extract all images from a pdf document, at native resolution and format? (Meaning extract tiff as tiff, jpeg as jpeg, etc. and without resampling). Layout is unimportant, I don't care were the source image is located on the page.
matt wilkie
  • 17,268
  • 24
  • 80
  • 115
115
votes
18 answers

Download and open PDF file using Ajax

I have an action class that generates a PDF. The contentType is set appropriately. public class MyAction extends ActionSupport { public String execute() { ... ... File report = signedPdfExporter.generateReport(xyzData, props); …
Nayn
  • 3,594
  • 8
  • 38
  • 48
115
votes
9 answers

Which one is the best PDF-API for PHP?

Which one of these is the best PDF-API for PHP? ApacheFOP dompdf FPDF html2ps mPDF PDFlib TCPDF wkhtmltopdf Zend_Pdf
coderex
  • 27,225
  • 45
  • 116
  • 170
114
votes
8 answers

How to Display blob (.pdf) in an AngularJS app

I have been trying to display pdf file which I am getting as a blob from a $http.post response. The pdf must be displayed within the app using for example. I came across a couple of stack posts but somehow my example doesn't seem to…
Simo Mafuxwana
  • 3,702
  • 6
  • 41
  • 59
113
votes
6 answers

Rstudio rmarkdown: both portrait and landscape layout in a single PDF

I wonder how to use rmarkdown to generate a pdf which has both portrait and landscape layout in the same document. If there is a pure rmarkdown option that would be even better than using latex. Here's a small, reproducible example. First, rendering…
user3712688
  • 1,151
  • 2
  • 8
  • 5
111
votes
5 answers

Generate PDF from Swagger API documentation

I have used the Swagger UI to display my REST webservices and hosted it on a server. However this service of Swagger can only be accessed on a particular server. If I want to work offline, does anybody know how I can create a static PDF using the…
Aman Mohammed
  • 2,878
  • 5
  • 25
  • 39
109
votes
18 answers

Combine two (or more) PDF's

Background: I need to provide a weekly report package for my sales staff. This package contains several (5-10) crystal reports. Problem: I would like to allow a user to run all reports and also just run a single report. I was thinking I could do…
Nathan Koop
  • 24,803
  • 25
  • 90
  • 125
109
votes
9 answers

split a multi-page pdf file into multiple pdf files with python?

I would like to take a multi-page pdf file and create separate pdf files per page. I have downloaded reportlab and have browsed the documentation, but it seems aimed at pdf generation. I haven't yet seen anything about processing PDF files…
monkut
  • 42,176
  • 24
  • 124
  • 155
107
votes
6 answers

Enable zooming/pinch on UIWebView

I have an UIWebView with a pdf-file. It works fine. But how can i enable zooming on the pdf-file?
David
  • 1,071
  • 2
  • 7
  • 4