Questions tagged [pdf]

Portable Document Format (PDF) is an open standard for electronic document exchange maintained by the International Organization for Standardization (ISO). Questions can be about creating, reading, editing PDFs using different languages.

The official ISO Specification (ISO 32000-1, a.k.a. 'PDF-1.7') is important as a reference, but it is not exactly written for PDF beginners.

Beginners may start with these two easy-to-read resources:

Related Tags

, , , , , , , , , , , , , , , , , , , ,

Questions

Related questions on Stack Overflow generally fall into the following domains:

  • How to convert, produce, or encode a PDF with , , etc.?
  • Everything else.

The first domain has been covered in depth, and any question you have is likely already answered.

Information Extraction

Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images.

PDF files generally contain drawing instructions. There's no such thing as "a table" in most PDF files. There are lines, glyphs, and raster images (and clipping, and color spaces, and so forth). It is all but impossible to determine what is or isn't a table in an arbitrary PDF file.

Note that a glyph is not a character. A glyph has an appearance; whereas, a character has meaning. Each font in a PDF may or may not map its glyphs to characters.

If at all possible, use the source data to extract information, rather than relying on the PDF. This file format is designed for visual consistency, and very little useful normalized data can be extracted from its contents.

Content

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

  • text stored as content streams (i.e. not text)
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

Related Links

For additional information about this file format see:

50972 questions
107
votes
18 answers

Display PDF within web browser

How can I display a pdf within a web browser on an .html page?
CodeGuy
  • 28,427
  • 76
  • 200
  • 317
106
votes
10 answers

Can we open pdf file using UIWebView on iOS?

Can we open the pdf file from UIWebView?
MohammedYakub M.
  • 2,893
  • 5
  • 31
  • 42
105
votes
13 answers

Generating a PDF file from React Components

I have been building a polling application. People are able to create their polls and get data regarding the question(s) they ask. I would like to add the functionality to let the users download the results in the form of a PDF. For example I have…
Ozan
  • 1,623
  • 3
  • 13
  • 25
104
votes
10 answers

How to embed a PDF viewer in a page?

If I'm not mistaken, Google Docs offers the means to display a PDF that is stored on the same server as the web page via an This works fine in Chrome, IE8+, Firefox etc, but for some reason, when some people are viewing it in IE8, the files are downloading instead of…
user2931470
  • 959
  • 1
  • 7
  • 3
95
votes
7 answers

Pandoc and foreign characters

I've been trying to use Pandoc to convert some Markdown into a PDF file. This is a sample that Pandoc will not convert for me: # Header! ## Sub Header themselves derived respectively from the Greek ἀναρχία i.e. 'anarchy' That's just something I…
Mike Thomsen
  • 36,828
  • 10
  • 60
  • 83
94
votes
10 answers

Merge PDF files with PHP

My concept is - there are 10 pdf files in a website. User can select some pdf files and then select merge to create a single pdf file which contains the selected pages. How can i do this with php?
Imrul.H
  • 5,760
  • 14
  • 55
  • 88
93
votes
2 answers

How to Use pdf.js

I am considering using pdf.js (an open source tool that allows embedding of a pdf in a webpage). There isn't any documentation on how to use it. I assume what I do is make an html page with the script referenced in the header, and then in the body,…
Chris
  • 1,881
  • 3
  • 20
  • 27
93
votes
7 answers

correct PHP headers for pdf file download

I'm really struggling to get my application to open a pdf when the user clicks on a link. So far the anchor tag redirects to a page which sends headers that are: $filename='./pdf/jobs/pdffile.pdf; $url_download = BASE_URL . RELATIVE_PATH .…
useyourillusiontoo
  • 1,287
  • 1
  • 10
  • 24
92
votes
12 answers

Converting a PDF to PNG

I'm trying to convert a PDF to a PNG image (at least the cover of one). I'm successfully extracting the first page of the PDF with pdftk. I'm using imagemagick to do the conversion: convert cover.pdf cover.png This works, but unfortunately the…
Adam
  • 1,467
  • 1
  • 15
  • 15
91
votes
12 answers

How can I send a file document to the printer and have it print?

Here's the basic premise: My user clicks some gizmos and a PDF file is spit out to his desktop. Is there some way for me to send this file to the printer queue and have it print to the locally connected printer? string filePath =…
Only Bolivian Here
  • 35,719
  • 63
  • 161
  • 257