Questions tagged [pdf]

Portable Document Format (PDF) is an open standard for electronic document exchange maintained by the International Organization for Standardization (ISO). Questions can be about creating, reading, editing PDFs using different languages.

The official ISO Specification (ISO 32000-1, a.k.a. 'PDF-1.7') is important as a reference, but it is not exactly written for PDF beginners.

Beginners may start with these two easy-to-read resources:

Related Tags

, , , , , , , , , , , , , , , , , , , ,

Questions

Related questions on Stack Overflow generally fall into the following domains:

  • How to convert, produce, or encode a PDF with , , etc.?
  • Everything else.

The first domain has been covered in depth, and any question you have is likely already answered.

Information Extraction

Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images.

PDF files generally contain drawing instructions. There's no such thing as "a table" in most PDF files. There are lines, glyphs, and raster images (and clipping, and color spaces, and so forth). It is all but impossible to determine what is or isn't a table in an arbitrary PDF file.

Note that a glyph is not a character. A glyph has an appearance; whereas, a character has meaning. Each font in a PDF may or may not map its glyphs to characters.

If at all possible, use the source data to extract information, rather than relying on the PDF. This file format is designed for visual consistency, and very little useful normalized data can be extracted from its contents.

Content

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

  • text stored as content streams (i.e. not text)
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

Related Links

For additional information about this file format see:

50972 questions
162
votes
9 answers

Add text to Existing PDF using Python

I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install. Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only…
Frozenskys
  • 4,290
  • 4
  • 28
  • 28
158
votes
8 answers

How to create PDFs in an Android app?

Is there any way to create PDF Files from an Android application?
user299908
156
votes
14 answers

How to create PDF files in Python

I'm working on a project which takes some images from user and then creates a PDF file which contains all of these images. Is there any way or any tool to do this in Python? E.g. to create a PDF file (or eps, ps) from image1 + image 2 + image 3 ->…
Stephen T.
  • 1,883
  • 2
  • 15
  • 11
148
votes
14 answers

How to make PDF file downloadable in HTML link?

I am giving link of a pdf file on my web page for download, like below Download Brochure The problem is when user clicks on this link then If the user have installed Adobe Acrobat, then it opens the file in the same…
djmzfKnm
  • 26,679
  • 70
  • 166
  • 227
140
votes
8 answers

Converting HTML files to PDF

I need to automatically generate a PDF file from an exisiting (X)HTML-document. The input files (reports) use a rather simple, table-based layout, so support for really fancy JavaScript/CSS stuff is probably not needed. As I am used to working in…
panschk
  • 3,228
  • 3
  • 24
  • 20
138
votes
10 answers

How can I visually inspect the structure of a PDF to reverse engineer it?

How can I inspect the structure of PDF files? Use case: I'm trying to programmatically generate PDF files (using iText). I'm having trouble achieving certain layouts, but I have PDF files with text laid out the way I want (generated from Word). I…
bmm6o
  • 6,187
  • 3
  • 28
  • 55
136
votes
12 answers

How can I display a pdf document into a Webview?

I want to display pdf contents on webview. Here is my code: WebView webview = new WebView(this); setContentView(webview); webview.getSettings().setJavaScriptEnabled(true);…
shriya
  • 1,385
  • 2
  • 9
  • 3
135
votes
11 answers

How to return PDF to browser in MVC?

I have this demo code for iTextSharp Document document = new Document(); try { PdfWriter.GetInstance(document, new FileStream("Chap0101.pdf", FileMode.Create)); document.Open(); document.Add(new Paragraph("Hello…
Tony Borf
  • 4,579
  • 8
  • 43
  • 51
134
votes
10 answers

How to convert webpage into PDF by using Python

I was finding solution to print webpage into local file PDF, using Python. one of the good solution is to use Qt, found here, https://bharatikunal.wordpress.com/2010/01/. It didn't work at the beginning as I had problem with the installation of…
Mark K
  • 8,767
  • 14
  • 58
  • 118
132
votes
4 answers

How to find out which fonts are referenced and which are embedded in a PDF document

We have a little problem with fonts in PDF documents. In order to put the finger on the problem I'd like to inspect, which fonts are actually embedded in the pdf document and which are only referenced. Is there an easy (and cheap as in free) way to…
Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
132
votes
6 answers

Duplicate headers received from server

Duplicate headers received from server The response from the server contained duplicate headers. This problem is generally the result of a misconfigured website or proxy. Only the website or proxy administrator can fix this issue. Error 349…
Purvesh Desai
  • 1,797
  • 2
  • 15
  • 39
131
votes
10 answers

Render HTML to PDF in Django site

For my django powered site, I am looking for an easy solution to convert dynamic html pages to pdf. Pages include HTML and charts from Google visualization API (which is javascript based, yet including those graphs is a must).
crib
129
votes
12 answers

How to display a PDF via Android web browser without "downloading" first

Is there a way to get the stock Android browser to auto-open a PDF, Word or other typical file without having to go through the process of downloading the file and then getting the user to open the file from the Downloads app or the Notification…
Chris Saldanha
  • 1,299
  • 2
  • 9
  • 3
128
votes
22 answers

Create PDF from a list of images

Is there any practical way to create a PDF from a list of images files, using Python? In Perl I know that module. With it I can create a PDF in just 3 lines: use PDF::FromImage; ... my $pdf =…
macabeus
  • 4,156
  • 5
  • 37
  • 66
128
votes
9 answers

Convert PDF to clean SVG?

I'm attempting to convert a PDF to SVG. However, the one I am using currently maps a path for every letter in every piece of text, meaning if I change the text in its source file, it looks ugly. I was wondering what the cleanest PDF to SVG…
DanRedux
  • 9,119
  • 6
  • 23
  • 41