Questions tagged [pdf]

Portable Document Format (PDF) is an open standard for electronic document exchange maintained by the International Organization for Standardization (ISO). Questions can be about creating, reading, editing PDFs using different languages.

The official ISO Specification (ISO 32000-1, a.k.a. 'PDF-1.7') is important as a reference, but it is not exactly written for PDF beginners.

Beginners may start with these two easy-to-read resources:

Related Tags

, , , , , , , , , , , , , , , , , , , ,

Questions

Related questions on Stack Overflow generally fall into the following domains:

  • How to convert, produce, or encode a PDF with , , etc.?
  • Everything else.

The first domain has been covered in depth, and any question you have is likely already answered.

Information Extraction

Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images.

PDF files generally contain drawing instructions. There's no such thing as "a table" in most PDF files. There are lines, glyphs, and raster images (and clipping, and color spaces, and so forth). It is all but impossible to determine what is or isn't a table in an arbitrary PDF file.

Note that a glyph is not a character. A glyph has an appearance; whereas, a character has meaning. Each font in a PDF may or may not map its glyphs to characters.

If at all possible, use the source data to extract information, rather than relying on the PDF. This file format is designed for visual consistency, and very little useful normalized data can be extracted from its contents.

Content

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

  • text stored as content streams (i.e. not text)
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

Related Links

For additional information about this file format see:

50972 questions
10
votes
4 answers

Creating PDF from request response doesn't work with axios but works in native xhr

In order to force download PDF from server I tried to use axios and native xhr object. The reason is that I have to send post request, because I pass too much data to server, so the option with simple link (like site.ru/download-pdf won't work for…
Victor
  • 5,073
  • 15
  • 68
  • 120
10
votes
1 answer

Submit pdf form fields to a HTTP POST request

I've made a pdf form in Adobe Acrobat. Now I want to make a button that submits the form to a HTTP POST request. I have searched for about 4 hours, but I have not found an example to do this. Here I read that it is possible to send the pdf form…
Josjojo
  • 182
  • 1
  • 1
  • 10
10
votes
1 answer

PyPDF2 nested bookmarks with same name not working

When you try and nest several bookmarks with the same name, PyPDF2 does not take it into account. Below self-contained python code to test what I mean (you need at have 3 pdf files named a, b and c in the working folder to test it out) from PyPDF2…
Chapo
  • 2,563
  • 3
  • 30
  • 60
10
votes
5 answers

Display PDF as HTML Form

I want display a PDF as an html page - where the user will be allowed to enter the fillable data. My problem is not how to import/fill data (I was able to do it using FDF/XML and ITextSharp). My only concern is how to show it to the user so that…
Sekhar
  • 5,614
  • 9
  • 38
  • 44
10
votes
2 answers

Is my pdf file encoded in UTF-8?

I would like to find out, if a pdf file is encoded in UTF-8. How to check, which caracter encoding is used in a pdf file?
Ronald
  • 2,721
  • 8
  • 33
  • 44
10
votes
4 answers

Search and replace for text within a pdf, in Python

I am writing mailmerge software as part of a Python web app. I have a template called letter.pdf which was generated from a MS Word file and includes the text {name} where the resident's name will go. I also have a list of c. 100 residents'…
Phil Hunt
  • 507
  • 1
  • 4
  • 15
10
votes
4 answers

How to create PDF containing Persian(Farsi) text with reportlab, rtl and bidi in python

I've been trying to create a PDF file from content that can be English, Persian, digits or a combination of them. there is some problems with Persian texts like: "این یک متن فارسی است" ۱- the text must be written from right to left 2- there is a…
r.aj
  • 399
  • 3
  • 11
10
votes
8 answers

How to embed a PDF file in a web site?

I simply want to embed a PDF file in a web site. The best solution I've found is Google Docs Viewer (http://docs.google.com/viewer), but it does not work for IE and obviously that is not going to work for me. Anyone have a clean, easy solution to…
JWM
  • 369
  • 2
  • 3
  • 10
10
votes
2 answers

Trigger print preview of base64 encoded PDF from javascript

I've looked around stackoverflow trying to find a way to do this for a while now, and can't find a suitable answer. I need to be able to load a PDF in either a new window or an iframe via a base64 encoded string and trigger a print preview of it…
jtate
  • 2,612
  • 7
  • 25
  • 35
10
votes
3 answers

How to add metadata to PDF document using PDFbox?

I have an input stream of a PDF document available to me. I would like to add subject metadata to the document and then save it. I'm not sure how to do this. I came across a sample recipe here:…
Anthony
  • 33,838
  • 42
  • 169
  • 278
10
votes
3 answers

Resource interpreted as Document but transferred with MIME type application/pdf

See the following code: Controller: public ActionResult GetPDF() { byte[] pdf = GetPdfFromDatabase(); return new FileContentResult(reportData, "application/pdf"); } View: