Questions tagged [pdf-to-html]

79 questions
2
votes
2 answers

Image is always at top in converted html from pdf

I am using following code and all contents of the specific pdf page are converting in a correct manner. But if there is any image in the middle of pdf page, that image in the HTML shows at the top. PHP CODE: umask(0); $output =…
naf4me
  • 403
  • 7
  • 17
2
votes
0 answers

PHP - Linking to html files is wrong in Pdf-to-html

I have installed Poppler Utils for windows in addition to https://github.com/mgufrone/pdf-to-html It works perfectly and it converts PDF files to HTML, by making a single HTML file contains 2 iframes, one for pages navigation and the other for the…
Abdelrahman Wahdan
  • 2,056
  • 4
  • 36
  • 43
2
votes
1 answer

Extracting Tables from PDF document

I want to extract Tables in a PDF document pro-grammatically using C# for a college project. i'm quite familiar with itextsharp. Is there a way i can extract tables in itextsharp ? Is there any other free library i can use for this purpose ? Can…
1
vote
0 answers

Upgrade from iText 5 to iText 7 pdftohtml, keep all fonts same

I need to keep all fonts the same when upgrading from pdfHTML 1.01 (default Helvetica in paragraph etc) to pdfHTML 2. How can this be accomplished the easiest way, using fontProvider? I need all formatting to be the same in newest pdftohtml (iText…
1
vote
0 answers

Poppler Utils pdftohtml Turkish Character problem

I am using pdftohtml package to convert PDF to HTML. However, some characters do not appear when I output HTML. I am giving the following parameters to the pdftohtml package. Normally, the font name in the PDF is Times New Roman. How can I solve…
xkraltr
  • 11
  • 2
1
vote
1 answer

Write output of pdftohtml to stdout

I'd like to run pdftohtml for a pdf file and write its output to /dev/stdout or something that permits me to catch output direct from subprocess. My code: cmd = ['pdftohtml', '-c', '-s', '-i', '-fontfullname', filename, '-stdout',…
1
vote
0 answers

Decreasing size of PDF when using puppeteer for pdf generation

We are using IDR for converting PDF documents to HTML. After doing some modifications we are using puppeteer for converting that document back to PDF I am getting files with increased page size (even if I don't do any modification to my HTML). For…
Saurabh Agrawal
  • 150
  • 1
  • 13
1
vote
1 answer

System.InvalidOperationException: ''DocumentRenderer' must be set before calling 'PrepareDocumentRenderer'.'

I'm trying to convert html codes to pdf with pdfsharp & migradoc. I use the RenderDocument() function for Turkish characters.But after the RenderDocument() function I get this error. System.InvalidOperationException: '' DocumentRenderer 'must be set…
Mutlu Ozkurt
  • 55
  • 1
  • 8
1
vote
1 answer

how to add background color tospecific dataframe column while converting to html

Need to colour column with diff colours. Using below code but nor working getting data in mail with straight line. To html method is working properly to generate table but need diff colour to all column in Dataframe. pdf.style.apply(highlight_cols,…
user13016448
1
vote
1 answer

How to give width, height, x and y coordinates in generating pdf from html using JSPDF new html API

I have been using JSPDF to generate pdf document based on some html. Earlier using jspdf fromHTML Api, we could give margins like this var margins2 = { top: 415, bottom: 10, left: 55, width: 300 }; …
1
vote
1 answer

is there any way to get data from editable pdf using javascript and angular or any other javascript

i am trying to display a PDF in UI that has some fields to be filled by the user and i am trying to access the filled data but i cannot access can anyone suggest a way to access the form data in PDF or any other easy method to implement this…
Ch Srinu
  • 13
  • 4
1
vote
0 answers

Looking for workaround to successfully convert PDType0Font and PDType1Fonts with pdf2dom

We are using pdf2dom library to convert a large set of newspaper pdfs to html. Number of pdfs in question exceed 5k pdf pages per day. Although we succeed in majority of cases and scenarios we fail to convert the pdfs fully in most cases. and get…
Gautam
  • 1,030
  • 13
  • 37
1
vote
1 answer

Provider com.levigo.jbig2.util.log.JDKLoggerBridge not a subtype

While writing PDF file to HTML file format using the code below... import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.PrintWriter; import java.io.Writer; import…
User
  • 4,023
  • 4
  • 37
  • 63
1
vote
2 answers

Saving html page in local storage using php

I am using PDFTOHTML (a php library) to convert pdf files to html and it's working fine but it's showing converted file in a browser and not storing in local folder, i want to store converted html in local folder using php with the same name as pdf…
Zohaib
  • 159
  • 1
  • 13
1
vote
1 answer

How to remove UnicodeEncodeError while using HTMLConverter

I'm trying to convert a PDF file into HTML format using HTML Converter. Provided below is the code that I'm using. from django.conf import settings settings.configure(PDF_MINER_IS_STRICT = True) from pdfminer.pdfinterp import PDFResourceManager,…
Greg
  • 23
  • 7