Questions tagged [pdfbox]

The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.

The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.

Features:

  • PDF to text extraction
  • Merge PDF Documents
  • PDF Document Encryption/Decryption
  • Lucene Search Engine Integration
  • Fill in form data FDF and XFDF
  • Create a PDF from a text file
  • Create images from PDF pages
  • Print a PDF
  • PDF/A validation

Official Website: http://pdfbox.apache.org/

Latest release: 2.0.21 released on 2020-08-20

Useful Links:

3571 questions
1
vote
0 answers

How to read text from PDF for slanted text alignment using PDFBox in java

I am using below logic to extract text from PDF using PDFBox. It is giving good output for normal PDFs. PDFTextStripper stripper = new…
sagar
  • 115
  • 1
  • 1
  • 10
1
vote
1 answer

PDFBox printing with PrintPDF command line tool

I am using DHL Shipping (XML) API in order to request DHL shipments and automatically print the responded shipping label. This is how the system works: The DHL response XML contains a base64-encoded pdf that contains: Page 1. The Shipping Label…
Peter Huson
  • 55
  • 1
  • 8
1
vote
0 answers

Printing to PostScript with PDFBox produces a massive file, why?

I am using PDFBox to create PDFs and that is working great. I also have a need to create PostScript files which I would like to generate from the PDF I create. I am using the following code to have PDFBox work with SimpleDoc to create the…
1
vote
2 answers

Change font colour and background colour of PDField using PDFBox - 2.0.2

I am using pdfbox-2.0.2 and I wanted to change colour of fonts of PDField. I can find examples for pdfbox-1.8.0 but not for pdfbox-2.0.2. I am getting PDFields using below code - PDDocument doc =…
Hardik Doshi
  • 67
  • 1
  • 3
  • 16
1
vote
0 answers

Page thumbnails with pdfbox

I'm trying to produce thumbnails for pages for (larger, > 50 pages) pdf documents using pdfbox. I tried using renderer.renderImage(pageNumber) and then scale the image. I also tried using a scale parameter and renderImageWithDPI. Unfortunately, all…
Juergen
  • 61
  • 1
  • 4
1
vote
1 answer

PDF Reading using PDF box - Clarification with page count

Read a pdf file from url with using of PDFbox, below jave code its perfect to read a pdf and stored in project location. String pdfPageCount = 17; String pdfUrl = "abc.org/invoicepdf.pdf?Range=1"; URL pdfDownload = new URL(pdfUrl); connectionGet =…
Prabu
  • 3,550
  • 9
  • 44
  • 85
1
vote
0 answers

pdfbox 2.0.2 > How to combine the TextPosition coordinates and Graphics GeneralPath coordinates into the same quadrant

As a newbie of pdfbox user, I plan to extract data in a table, but tables with special formats, say with merged column headers should be processed with the help of table's borderlines. Therefore, the coordinates of the text and at least the table's…
Rui
  • 3,454
  • 6
  • 37
  • 70
1
vote
0 answers

I can't fill a pdf template with data

I have some data which I want to save in a pdf file. I don't want to create a pdf from source code. A client gives me many different templates and I want to fill these with data which he needs. For example I have an information about "name",…
steeve
  • 27
  • 6
1
vote
1 answer

Parse PDF table and display it as CSV(Java)

I am trying to parse a TABLE in PDF file and display it as CSV. I have attached sample data from PDF below(only few columns) and sample output for the same. Each column width is fixed, let's say Company Name(18 char),Amount(8 char), Type(5 char)…
user6404269
  • 33
  • 1
  • 6
1
vote
0 answers

PDFBox enlarge signature size

I`m using pdfbox 1.8.9 for creating PDF . To add signature image i use PDPixelMap but i need to make signature larger or bolded . So far i tried to manipulate signature image (size) but didn't work. Here is my code example how i add signature(png…
1
vote
0 answers

How to display arabic characters in pdf file generated using PDFBox

I'm trying to display an arabic string in a pdf file generated using PDFBox, Actually i can display an arabic string from RTL using ICU4J and a specific font but the problem is this : even if the string appear correctly from RTL the characters still…
magran x
  • 91
  • 9
1
vote
0 answers

Use of Extended Features is no longer available after saving text in pdf acroform pdfboxafter saving text in pdf acroform pdfbox

After adding text to pdf acroform using pdfbox programatically, i am getting Use of Extended Features is no longer available while opening the file in adobe acrobat reader.
klash75
  • 11
  • 1
1
vote
0 answers

WARNING: Using fallback font 'Arial-BoldMT' for 'AccordAltBold' despite embedded and system fonts

I've got a single page pdf form that uses custom fonts (made with libreoffice). The final output file will use this template multiple times. Each form control within the page has my custom font set as it's font type. The pdf file also has the fonts…
Yash Capoor
  • 346
  • 4
  • 14
1
vote
1 answer

How to merge 10000 pdf into one using pdfbox in most effective way

PDFBox api is working fine for less number of files. But i need to merge 10000 pdf files into one, and when i pass 10000 files(about 5gb) it's taking 5gb ram and finally goes out of memory. Is there some implementation for such requirement in…
Gajendra Kumar
  • 908
  • 2
  • 12
  • 28
1
vote
1 answer

PDFBox "Symbolic fonts must have a built-in encoding" error when using PDFTextStripper.getText()

I'm using Apache PDFBox 2.0.2. Loading pdf documents from web to get a text inside. URL u = new URL("url/to/file.pdf"); PDDocument pddDocument = PDDocument.load(u.openStream()); PDFTextStripper textStripper = new PDFTextStripper(); String doc =…
Hub
  • 25
  • 5
1 2 3
99
100