Questions tagged [pdf2htmlex]

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. It aims to provide an accurate rendering, while keeping optimized for Web display.

pdf2htmlEX is best for text-based PDF files, for example scientific papers with complicated formulas and figures. Text, fonts and formats are natively preserved in HTML such that you can still search and copy. Math formulas, figures and images are also supported. The generated HTML file is static, with optional features powered by JavaScript.

pdf2htmlEX is also a publishing tool, almost 50 options make it flexible for many different use cases: PDF preview, book/magazine publishing, personal resume...

Useful links:

Documentation

31 questions

vote

0 answers

How to get sticky notes attached to pdf documents while using pdf2htmlEx tool?

Used the option --process-annotation 1 to view annotations in pdf documents This works fine for Highlight Underline Strikethrough Rectangular box And not for Notes added in Sticky notes - the converted html contains only note icon - missing…

pdf2htmlex

asked Mar 03 '16 at 10:42

Tom Taylor

3,344
2
38
63

vote

0 answers

Extract all content from PDF file (not just text, but also tables/diagrams)?

I'd like to reformat PDF main content, so I need to extract its main content, not just text, but also tables, diagrams, etc. with their layout information. I'm only interested in the main part of the content, for example, for technical paper, I'm…

javascript pdf clojure pdf.js pdf2htmlex

asked Aug 05 '15 at 15:26

Yu Shen

2,770
3
33
48

vote

1 answer

split pdf to multiple html file with pdf2htmlEX

I'm trying to split a PDF file into separate HTML files. I mean for each PDF page I want an HTML file. This is how I do it: pdf2htmlEX --split-pages 1 LMS.pdf --page-filename lms%03.html In the result I got an empty LMS.html and other files:…

html pdf pdf2htmlex

asked Oct 14 '14 at 12:25

HamidIng

votes

0 answers

How to identify the modified content in a pdf file?

Now I have a pdf file which I see the creation time and the modification time. Is there a way to know which part (e.g. tables/figures/text) are modified in the metadata? In other words, how could I identify the difference between the initial pdf…

python java pdf metadata pdf2htmlex

asked Mar 03 '23 at 16:30

Syhaa

votes

0 answers

pdf2HtmlEX process PDF coredump

I use following command tansform pdf to html. then I got croedump file. ./pdf2htmlEX --zoom 1 --dest-dir ./pdf_test --optimize-text 1 --zoom 1.4 --process-outline 0 --embed-image 0 --font-format ttf pdf_test/020616320411_2.pdf [coredump message is…

coredump pdf2htmlex

asked Feb 09 '23 at 03:00

oddstar2018

votes

0 answers

Using co-ordinates in XML generated by poppler to build an email template

Generated a 72 dpi image and XML with zoom as 1 from this PDF. Although the DPI was 72, to be able to make the conversion of co-ordinates in the XML to pixel possible had to iteratively tweak the DPI using this sheet. 90.5 seems to work well.…

pdf html-email poppler pdf-to-html pdf2htmlex

asked Sep 28 '21 at 10:05

qwertynik

votes

1 answer

Convert PDF to HTML without losing any format

I'm developing a Python Flask webapp and I'm trying to convert some user uploaded pdfs to nicely formatted HTML, like the HTML that is being produced when you display a pdf inside an iframe. I tried several things so far: the pdfminer.six library,…

python html pdf heroku pdf2htmlex

asked Mar 24 '20 at 14:41

robo-monk

votes

1 answer

Pdf2htmlEX common error "Cannot load font"

Running the pdf2htmlEX.exe Windows binary from the command prompt works as expected. While, running the pdf2htmlEX Windows binary in a wrapper (.Net in my case) I received an error like the one below. __tmp_font1.ttf is not in a known format (or…

pdf2htmlex

asked Oct 04 '19 at 21:26

Bernesto

1,368
17
19

votes

1 answer

Pdf2Html Installation

I 'm trying to install Pdf2HtmlEx Software on Ubuntu Server 18.04.1 LTS. The repository is not maintained but the sotware is very useful for me. I installed it on Xubuntu desktop distro and on a docker image but i can't do it on ubuntu server. It…

installation pdf2htmlex

asked Nov 05 '18 at 20:35

Agus Trombotto

votes

2 answers

Install pdf2htmlEX on heroku

I used this Aptfile: fonts-liberation libreoffice-base-core libreoffice-calc libreoffice-writer libreoffice libpython2.7 pdf2htmlex poppler-utils And installation completed successfully. I even checked version of pdf2htmlEX in heroku…

ruby-on-rails heroku pdf2htmlex

asked Oct 15 '18 at 18:39

Alex Kleshchevnikov

votes

0 answers

running Pdf2htmlEX on linux using php

Kindly I request your help on the following issue: I am using pdf2htmlEX to convert my pdf files to HTML. The tool is working perfectly in WAMP; however, when I implement it on my Linux server, the tool is not working. My php code:

php linux exec pdf2htmlex

asked May 03 '18 at 08:02

Rasha Yehya

votes

0 answers

pdfminer when I am trying to run pdf2txt.py not working in windows

I have installed pdfminer and when I am trying to run pdf2txt.py test.pdf -t html -o test.html no error showing and command also not executing in windows. Please help me how can i convert true pdf files in html file. Thanks.

python windows pdf pdfminer pdf2htmlex

asked Apr 12 '18 at 15:10

Venkata Narayana Reddy Gurrala

votes

1 answer

pdf2htmlEX's output shows Times New Roman font for only a few characters?

I have never seen anything like this. I use a tool called pdf2htmlEX, which converts a PDF to HTML, but I have a weird issue. Look at this screenshot: See the first character (W)? It's in Times New Roman. Now here's the even more weird part: Only…

fonts pdf2htmlex

asked Apr 05 '18 at 11:18

MortenMoulder

6,138
11
60
116

votes

1 answer

Pdf2htmlEx: The html size converted by pdf is very large?

Now I convert pdf to html via pdf2htmlEx, Source file pdf 21MB, Converted html nearly 900MB, Conversion command: pdf2htmlEX --no-drm 0 --embed-image 1 --dest-dir ./output09 ./b.pdf ./b.html Is there any way to improve the size of the output html?

pdf2htmlex

asked Sep 13 '17 at 06:03

charisMao

votes

2 answers

Getting text location from pdf

I want to know the location of all the words in the pdf page. I have been trying to find something on the web but couldn't. Can anyone help me which library (preferably in java platform) should I use?

pdf itext pdfbox pdf2htmlex

asked Dec 08 '15 at 11:01

Prabhjot Rai

Prev 1

3 Next