Questions tagged [pdf-to-html]
79 questions
2
votes
2 answers
Image is always at top in converted html from pdf
I am using following code and all contents of the specific pdf page are converting in a correct manner. But if there is any image in the middle of pdf page, that image in the HTML shows at the top.
PHP CODE:
umask(0);
$output =…

naf4me
- 403
- 7
- 17
2
votes
0 answers
PHP - Linking to html files is wrong in Pdf-to-html
I have installed Poppler Utils for windows in addition to https://github.com/mgufrone/pdf-to-html
It works perfectly and it converts PDF files to HTML, by making a single HTML file contains 2 iframes, one for pages navigation and the other for the…

Abdelrahman Wahdan
- 2,056
- 4
- 36
- 43
2
votes
1 answer
Extracting Tables from PDF document
I want to extract Tables in a PDF document pro-grammatically using C# for a college project. i'm quite familiar with itextsharp.
Is there a way i can extract tables in itextsharp ?
Is there any other free library i can use for this purpose ?
Can…

Buddhima Naween Rathnayake
- 79
- 1
- 13
1
vote
0 answers
Upgrade from iText 5 to iText 7 pdftohtml, keep all fonts same
I need to keep all fonts the same when upgrading from pdfHTML 1.01 (default Helvetica in paragraph etc) to pdfHTML 2. How can this be accomplished the easiest way, using fontProvider?
I need all formatting to be the same in newest pdftohtml (iText…

James Gillies
- 23
- 4
1
vote
0 answers
Poppler Utils pdftohtml Turkish Character problem
I am using pdftohtml package to convert PDF to HTML. However, some characters do not appear when I output HTML. I am giving the following parameters to the pdftohtml package. Normally, the font name in the PDF is Times New Roman. How can I solve…

xkraltr
- 11
- 2
1
vote
1 answer
Write output of pdftohtml to stdout
I'd like to run pdftohtml for a pdf file and write its output to /dev/stdout or something that permits me to catch output direct from subprocess.
My code:
cmd = ['pdftohtml', '-c', '-s', '-i', '-fontfullname', filename, '-stdout',…

Kfcaio
- 442
- 1
- 8
- 20
1
vote
0 answers
Decreasing size of PDF when using puppeteer for pdf generation
We are using IDR for converting PDF documents to HTML.
After doing some modifications we are using puppeteer for converting that document back to PDF I am getting files with increased page size (even if I don't do any modification to my HTML).
For…

Saurabh Agrawal
- 150
- 1
- 13
1
vote
1 answer
System.InvalidOperationException: ''DocumentRenderer' must be set before calling 'PrepareDocumentRenderer'.'
I'm trying to convert html codes to pdf with pdfsharp & migradoc. I use the RenderDocument() function for Turkish characters.But after the RenderDocument() function I get this error.
System.InvalidOperationException: '' DocumentRenderer 'must be set…

Mutlu Ozkurt
- 55
- 1
- 8
1
vote
1 answer
how to add background color tospecific dataframe column while converting to html
Need to colour column with diff colours. Using below code but nor working getting data in mail with straight line. To html method is working properly to generate table but need diff colour to all column in Dataframe.
pdf.style.apply(highlight_cols,…
user13016448
1
vote
1 answer
How to give width, height, x and y coordinates in generating pdf from html using JSPDF new html API
I have been using JSPDF to generate pdf document based on some html. Earlier using jspdf fromHTML Api, we could give margins like this
var margins2 = {
top: 415,
bottom: 10,
left: 55,
width: 300
};
…

Atul kumar singh
- 454
- 10
- 24
1
vote
1 answer
is there any way to get data from editable pdf using javascript and angular or any other javascript
i am trying to display a PDF in UI that has some fields to be filled by the user and i am trying to access the filled data but i cannot access can anyone suggest a way to access the form data in PDF or any other easy method to implement this…

Ch Srinu
- 13
- 4
1
vote
0 answers
Looking for workaround to successfully convert PDType0Font and PDType1Fonts with pdf2dom
We are using pdf2dom library to convert a large set of newspaper pdfs to html. Number of pdfs in question exceed 5k pdf pages per day.
Although we succeed in majority of cases and scenarios we fail to convert the pdfs fully in most cases. and get…

Gautam
- 1,030
- 13
- 37
1
vote
1 answer
Provider com.levigo.jbig2.util.log.JDKLoggerBridge not a subtype
While writing PDF file to HTML file format using the code below...
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.Writer;
import…

User
- 4,023
- 4
- 37
- 63
1
vote
2 answers
Saving html page in local storage using php
I am using PDFTOHTML (a php library) to convert pdf files to html and it's working fine but it's showing converted file in a browser and not storing in local folder, i want to store converted html in local folder using php with the same name as pdf…

Zohaib
- 159
- 1
- 13
1
vote
1 answer
How to remove UnicodeEncodeError while using HTMLConverter
I'm trying to convert a PDF file into HTML format using HTML Converter. Provided below is the code that I'm using.
from django.conf import settings
settings.configure(PDF_MINER_IS_STRICT = True)
from pdfminer.pdfinterp import PDFResourceManager,…

Greg
- 23
- 7