Questions tagged [pdf]

Portable Document Format (PDF) is an open standard for electronic document exchange maintained by the International Organization for Standardization (ISO). Questions can be about creating, reading, editing PDFs using different languages.

The official ISO Specification (ISO 32000-1, a.k.a. 'PDF-1.7') is important as a reference, but it is not exactly written for PDF beginners.

Beginners may start with these two easy-to-read resources:

Related Tags

, , , , , , , , , , , , , , , , , , , ,

Questions

Related questions on Stack Overflow generally fall into the following domains:

  • How to convert, produce, or encode a PDF with , , etc.?
  • Everything else.

The first domain has been covered in depth, and any question you have is likely already answered.

Information Extraction

Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images.

PDF files generally contain drawing instructions. There's no such thing as "a table" in most PDF files. There are lines, glyphs, and raster images (and clipping, and color spaces, and so forth). It is all but impossible to determine what is or isn't a table in an arbitrary PDF file.

Note that a glyph is not a character. A glyph has an appearance; whereas, a character has meaning. Each font in a PDF may or may not map its glyphs to characters.

If at all possible, use the source data to extract information, rather than relying on the PDF. This file format is designed for visual consistency, and very little useful normalized data can be extracted from its contents.

Content

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

  • text stored as content streams (i.e. not text)
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

Related Links

For additional information about this file format see:

50972 questions
10
votes
1 answer

Displaying PDF files in Angular with ng2-pdf-viewer

I'm trying to use the ng2-pdf-viewer component to display several PDFs from my ASP.NET Web API backend. I've added PdfViewerComponent to my module's declarations and the provided example works fine: import { Component } from…
Mathis Garberg
  • 425
  • 2
  • 7
  • 15
10
votes
2 answers

barryvdh/laravel-dompdf page break content changes PDF

I am using this package barryvdh/laravel-dompdf is there any way to detect if content fits on the current page if not I want to put that content to the next page if it does not fit completly.
shkurta
  • 131
  • 1
  • 1
  • 7
10
votes
1 answer

Reading PDF from within an Android application

I am looking for a library/API that I wish to integrate to an Android application so that I can read any PDF files from within the application instead of using any third party applications! By any PDF, I mean, say an Article in a magazine. I have…
Jaykay
  • 666
  • 1
  • 4
  • 17
10
votes
2 answers

How to export google chart in pdf?

I have draw google chart. Now, I want to put button to save the chart in pdf format. I do look from here Save google charts as pdf and other questions available in stack but it doesn't work. Print png image by google chart already used but it just…
joun
  • 656
  • 1
  • 8
  • 25
10
votes
2 answers

Display pdf generated using mpdf inline in mobile browsers

Is there any way to display pdf generated using mpdf inline in mobile browsers? I went through mpdf documentation and tried destination option mpdf->output('filename.pdf','I'). It works pretty well across every browser in desktop except IE and…
Vishwas R
  • 3,340
  • 1
  • 16
  • 37
10
votes
1 answer

How to extract text from a Specific Area in a PDF using Python?

I'm trying to extract Text from a PDF using Python, and I have successfully done so using PyPDF2 like this: from PyPDF2 import PdfFileReader reader = PdfFileReader('path.pdf') page = reader.getPage(0) page.extractText() This extracts all the Text…
Devdatta Tengshe
  • 4,015
  • 10
  • 46
  • 59
10
votes
2 answers

Save html file as PDF

I'm using a PHP Output Buffer to create an HTML file of a dynamic 'Data Review' page, I then save this output as an HTML file to the server and would like to create a PDF file of this HTML file (stored on the server) but every solution I've looked…
Cal Brown
  • 145
  • 1
  • 1
  • 8
10
votes
7 answers

How can I tell the resolution of scanned PDF from within a shell script?

I have a large collection of documents scanned into PDF format, and I wish to write a shell script that will convert each document to DjVu format. Some documents were scanned at 200dpi, some at 300dpi, and some at 600dpi. Since DjVu is a…
Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
10
votes
3 answers

PDFBox: Problem with converting pdf page into image

My mission is pretty simple: converting every single page of a pdf file into images. I tried using icepdf open source version to generate the images but they don't generate the image with the correct font. So I start using PDFBox instead. The code…
user552910
  • 101
  • 1
  • 1
  • 5
10
votes
2 answers

How to convert a html document into a pdf using report lab with python

I am trying to convert a html document that I have created into a pdf using report lab. The html document is below. I am unsure on how to do this and I have looked online and cant seem to find a solution for this. html document
johnsmith
  • 123
  • 1
  • 1
  • 8
10
votes
1 answer

Attaching pdf file to an EmailMessage?

I am trying to set up an automated email sending script. I am using the email module and the EmailMessage object from the email.message module and am sending the email using the smtplib module. I would like to be able to attach a .pdf file to an…
Vladimir Yevseenko
  • 101
  • 1
  • 1
  • 4
10
votes
1 answer

Using pdf.js on a node server

I want to convert a pdf to an image server-side, using node.js. My input for this task is pdf's url, and the desired output is a base64 string, representing an image. I've decided to try pdf.js (https://github.com/mozilla/pdf.js) and node-canvas…
tristantzara
  • 5,597
  • 6
  • 26
  • 40
10
votes
2 answers

Installing R on Linux: configure: WARNING: you cannot build PDF versions of the R manuals

While configuring R on Debian GNU/Linux 8.8 (jessie) I am getting the warning above. Any ideas which package should be installed to solve the issue and have the manuals build as PDF?
a1an
  • 3,526
  • 5
  • 37
  • 57
10
votes
4 answers

Create PDF / Word (Doc) file within app

Is there a definitive method of creating either a PDF or a MS Word Doc file within the app and email it immediately (and possibly, also store it). I have been trying for quite some time and have found out the JAVA libraries: apwlibrary and iText.…
Siddharth Lele
  • 27,623
  • 15
  • 98
  • 151
10
votes
3 answers

Encoding PDF binary data to base64 not working with NodeJS

I'm trying to get a PDF stream return that comes from a API and parse it to base64 to embbed it in the client side, the body of the API request is returning something like this: %PDF-1.5 %���� 4 0 obj << /Type/XObjcect /Subtype/Image /Width…
Loading...
  • 149
  • 1
  • 2
  • 9
1 2 3
99
100