Questions tagged [pdf-conversion]

Relating to converting between Portable Document Format and other file formats. Questions asking us to recommend or find a conversion tool or library are off-topic.

This tag is for questions relating to programmatically converting to and from the open standard file format . If a specific conversion is involved, the appropriate tag should also be used: etc.

Conversion solutions may range from complete rasterization (and graphic embedding) to intense . The middle ground generally converts at a high enough level to recognize and use text attributes where possible, falling back to graphic rendering only when necessary.

Questions asking us to recommend or find a tool, library, documentation or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam.

266 questions
0
votes
1 answer

EvoPdf loading images from Dropbox

I have an application that I put some drop box images like (https://www.dropbox.com/sh/u3xjkrah9fzm7ju/AAB_TLn83FQH456O79od0_moa/3286Z.png?dl=1) and then I convert the page to a PDF using EVOPDF, but these images aren't rendered.
0
votes
1 answer

Aspose HTML to PDF conversion- hyperlinks to content on same file not working

I am using AsposePDF for .Net version 17.3 for bulk conversion of lot of html files to PDF. I have an existing html file with hyperlinks to content in same file. Below is a sample of the html in the file. Link: Section…
Unnie
  • 918
  • 8
  • 30
0
votes
1 answer

Conversion pdf document which includes tables to csv file using python or any other langaue

I tried to convert pdf document (includes tables) into csv file. Unfortunately I failed. I have used the following approaches: Used pdfminer first converted the pdf to text but structure of text file was not same as of pdf file . Used pypdf2 first…
Umair.P
  • 1
  • 1
0
votes
2 answers

Ghostscript's pdfwrite to grayscale results in wrong graylevel

I try to convert a PDF file (test.pdf, attached below) using Ghostscript (9.20 on Windows) to only use the Graylevel colorspace (not RGB or CMY): gswin64c.exe -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray…
L Prosten
  • 1
  • 2
0
votes
1 answer

How to recognize text in a PDF order?

I'm trying to recognize text in a pdf order with Ghostscript and Tesseract 3.0.2 . I cannot use itextsharp because the pdf doesn't contain text but just an image. First, I convert the pdf page in an image and then I try to get the text. In a first…
Francesco
  • 352
  • 1
  • 8
  • 19
0
votes
0 answers

Adding Left Border line in itextsharp

I was facing issue to work on rowspan but with some code in C#, I was able to achieve that. Currently I have the data what I needed. How can I draw left border line for the first column so that it looks correct. here is what I have now. I need to…
Manjuboyz
  • 6,978
  • 3
  • 21
  • 43
0
votes
0 answers

Can't convert pdf to text even though trying pdfminer, pdf2txt, textract in Python

I'm having a trouble extracting text from pdf files which were originally converted from InDesign and Illustrator. I'm working on a project that needs data from these pdf files. I have tried pdfminer, pdf2txt libs in Python, but none of them works…
Nhi Tran
  • 11
  • 3
0
votes
1 answer

ConversionInputException on a complex web application

I've got this ConversionInputException when I invoke both execute() or schedule() methods on a specific converter. I think the code it's correct because if I execute the code as a simple java application it work perfectly with the same file as…
D. Pesc.
  • 196
  • 15
0
votes
1 answer

When converting PDF to Excel with Omnipage or Abbyy Finereader, is there are way to stop it from splitting individual cells?

I'm trying to extract some tables from PDF files, and both tools (Abbyy and Omnipage) do a pretty good job of identifying the tables. But when it comes to identifying the rows and columns, they both make the same mistakes. Usually, the problem comes…
mgalka
  • 171
  • 1
  • 6
0
votes
1 answer

Ghostscript textwriter preserve blank lines

I'm trying to convert pdfs to text files. I use this command to perform the conversion: gs -dBATCH -dNOPAUSE -sDEVICE=txtwrite -sOutputFile=output.txt input.pdf Ghostscript version is 9.07. I get all the text shown in PDF. I'd like to preserve the…
Will
  • 1,718
  • 3
  • 15
  • 23
0
votes
0 answers

iText converting incomplete Html file content to pdf using java

I am trying to convert html file into pdf using iText lib(4.2.0). But the problem is it's not printing all the html content to pdf, its only partially printing some data. Here is the code to convert html to pdf. InputStream il = new…
pradex
  • 328
  • 4
  • 18
0
votes
0 answers

Convert Base64 from PDF to Bitmap

I would like to convert a PDF to a Bitmap, so I can show it on my ASP.NET page. But when I run my code it fails at creating the Bitmap. Does anyone know what's the problem? string filepath = "C:\\Temp\\Sample.pdf"; byte[] pdfByte =…
sarah
  • 9
0
votes
1 answer

How to convert files to PDF simultaneously?

I have a node.js web application and I want to be able to convert many documents (in the same time) to PDF. In this moment I use libreoffice with a queue (The purpose of queue is to avoid infinite conversion for a file - if libreoffice cannot…
roroinpho21
  • 732
  • 1
  • 11
  • 27
0
votes
1 answer

How to convert the PDF content code to the type like "(<0034>) Tj"?

PDF content are saved as several ways, "(abc) Tj", "(<0035><0035>) Tj" or "\u065". I want to know if there is a way to convert the PDF code to one type, no matter direct text "(abc) Tj", or hexadecimal "(<0035><0035>) Tj", or Octal "\u065". I think…
SuperBerry
  • 1,193
  • 1
  • 12
  • 28
0
votes
2 answers

What's the best way to convert docx/pptx documents to PDF from a Windows Universal App?

Usually, I would use the Microsoft Office Interop library, but it requires the use of COM objects, which (as far as I know) isn't possible if I'm developing a Windows Universal app. What are some alternative methods I could use to convert Word and…