Questions tagged [pdf-conversion]

Relating to converting between Portable Document Format and other file formats. Questions asking us to recommend or find a conversion tool or library are off-topic.

This tag is for questions relating to programmatically converting to and from the open standard file format pdf. If a specific conversion is involved, the appropriate tag should also be used: openoffice-writer msword tiff jpeg etc.

Conversion solutions may range from complete rasterization (and graphic embedding) to intense ocr. The middle ground generally converts at a high enough level to recognize and use text attributes where possible, falling back to graphic rendering only when necessary.

Questions asking us to recommend or find a tool, library, documentation or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam.

266 questions

votes

1 answer

EvoPdf loading images from Dropbox

I have an application that I put some drop box images like (https://www.dropbox.com/sh/u3xjkrah9fzm7ju/AAB_TLn83FQH456O79od0_moa/3286Z.png?dl=1) and then I convert the page to a PDF using EVOPDF, but these images aren't rendered.

pdf-conversion evopdf

asked May 23 '17 at 22:34

Gustavo Lopes Colhado

votes

1 answer

Aspose HTML to PDF conversion- hyperlinks to content on same file not working

I am using AsposePDF for .Net version 17.3 for bulk conversion of lot of html files to PDF. I have an existing html file with hyperlinks to content in same file. Below is a sample of the html in the file. Link: Section…

c# .net html-to-pdf pdf-conversion aspose.pdf

asked Apr 24 '17 at 11:32

Unnie

votes

1 answer

Conversion pdf document which includes tables to csv file using python or any other langaue

I tried to convert pdf document (includes tables) into csv file. Unfortunately I failed. I have used the following approaches: Used pdfminer first converted the pdf to text but structure of text file was not same as of pdf file . Used pypdf2 first…

python-2.7 xmp pdf-conversion pdfminer

asked Mar 31 '17 at 07:49

Umair.P

votes

2 answers

Ghostscript's pdfwrite to grayscale results in wrong graylevel

I try to convert a PDF file (test.pdf, attached below) using Ghostscript (9.20 on Windows) to only use the Graylevel colorspace (not RGB or CMY): gswin64c.exe -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray…

pdf ghostscript grayscale pdf-conversion

asked Feb 23 '17 at 08:43

L Prosten

votes

1 answer

How to recognize text in a PDF order?

I'm trying to recognize text in a pdf order with Ghostscript and Tesseract 3.0.2 . I cannot use itextsharp because the pdf doesn't contain text but just an image. First, I convert the pdf page in an image and then I try to get the text. In a first…

c# pdf tesseract pdf-conversion text-recognition

asked Feb 16 '17 at 12:20

Francesco

votes

0 answers

Adding Left Border line in itextsharp

I was facing issue to work on rowspan but with some code in C#, I was able to achieve that. Currently I have the data what I needed. How can I draw left border line for the first column so that it looks correct. here is what I have now. I need to…

pdf-generation itext pdf-conversion

asked Jul 13 '16 at 13:25

Manjuboyz

6,978
3
21
43

votes

0 answers

Can't convert pdf to text even though trying pdfminer, pdf2txt, textract in Python

I'm having a trouble extracting text from pdf files which were originally converted from InDesign and Illustrator. I'm working on a project that needs data from these pdf files. I have tried pdfminer, pdf2txt libs in Python, but none of them works…

python text adobe-indesign pdf-conversion pdfminer

asked Jun 21 '16 at 18:09

Nhi Tran

votes

1 answer

ConversionInputException on a complex web application

I've got this ConversionInputException when I invoke both execute() or schedule() methods on a specific converter. I think the code it's correct because if I execute the code as a simple java application it work perfectly with the same file as…

java pdf docx pdf-conversion documents4j

asked Apr 28 '16 at 15:19

D. Pesc.

votes

1 answer

When converting PDF to Excel with Omnipage or Abbyy Finereader, is there are way to stop it from splitting individual cells?

I'm trying to extract some tables from PDF files, and both tools (Abbyy and Omnipage) do a pretty good job of identifying the tables. But when it comes to identifying the rows and columns, they both make the same mistakes. Usually, the problem comes…

excel pdf ocr pdf-conversion abbyy

asked Mar 22 '16 at 22:56

mgalka

votes

1 answer

Ghostscript textwriter preserve blank lines

I'm trying to convert pdfs to text files. I use this command to perform the conversion: gs -dBATCH -dNOPAUSE -sDEVICE=txtwrite -sOutputFile=output.txt input.pdf Ghostscript version is 9.07. I get all the text shown in PDF. I'd like to preserve the…

pdf ghostscript pdf-conversion textwriter

asked Mar 20 '16 at 19:35

Will

1,718
3
15
23

votes

0 answers

iText converting incomplete Html file content to pdf using java

I am trying to convert html file into pdf using iText lib(4.2.0). But the problem is it's not printing all the html content to pdf, its only partially printing some data. Here is the code to convert html to pdf. InputStream il = new…

java itext html-to-pdf pdf-conversion

asked Feb 24 '16 at 20:51

pradex

votes

0 answers

Convert Base64 from PDF to Bitmap

I would like to convert a PDF to a Bitmap, so I can show it on my ASP.NET page. But when I run my code it fails at creating the Bitmap. Does anyone know what's the problem? string filepath = "C:\\Temp\\Sample.pdf"; byte[] pdfByte =…

c# asp.net bitmap memorystream pdf-conversion

asked Feb 04 '16 at 13:42

sarah

votes

1 answer

How to convert files to PDF simultaneously?

I have a node.js web application and I want to be able to convert many documents (in the same time) to PDF. In this moment I use libreoffice with a queue (The purpose of queue is to avoid infinite conversion for a file - if libreoffice cannot…

node.js libreoffice pdf-conversion

asked Oct 26 '15 at 13:50

roroinpho21

votes

1 answer

How to convert the PDF content code to the type like "(<0034>) Tj"?

PDF content are saved as several ways, "(abc) Tj", "(<0035><0035>) Tj" or "\u065". I want to know if there is a way to convert the PDF code to one type, no matter direct text "(abc) Tj", or hexadecimal "(<0035><0035>) Tj", or Octal "\u065". I think…

pdf pdf-generation ghostscript pdf-conversion pdf-parsing

asked Aug 22 '15 at 00:45

SuperBerry

1,193
1
12
28

votes

2 answers

What's the best way to convert docx/pptx documents to PDF from a Windows Universal App?

Usually, I would use the Microsoft Office Interop library, but it requires the use of COM objects, which (as far as I know) isn't possible if I'm developing a Windows Universal app. What are some alternative methods I could use to convert Word and…

c# pdf office-interop win-universal-app pdf-conversion

asked Jul 28 '15 at 15:29

Gary Chien

Prev 1 2 3

…

17 18 Next