open source project for pdf/djvu/or-other-image-format text reflow -- formatting books to kindle or other e-readers

Question

Since i bought my own kindle4, i have been searching for software that would help me read scientific papers or comics on it. So far my search has yielded k2pdfopt for the papers
and Briss for the comics - actually only manga.
The first link, i.e. the page on k2pdfopt mentions some very good software to crop out certain portions of the pdf to another. Note that k2pdfopt operates on a different line from these pdf-cropping software (including Briss). It recognises words, equations etc as text blocks that are reflowed in their image form to fit the e-reader. The Wikipedia page for Reflowable Document(http://en.wikipedia.org/wiki/Reflowable_document) mentions an experimental software designed by Xerox PARC that works in a similar way to k2pdfopt.
My question thus is whether there is an existing open source project (or more) that approaches the problem in a similar way- i.e. recognises text at the word level as images and then uses algorithms to typeset those images.

k2pdfopt yields an exe file - haven't tried it with wine yet.
although the software is highly customizable - i.e word spacing and interline spacing can be suggested to it, there is no user interface, and all the pages have to be treated the same way. Thus no way to recognise the tables of contents, for example, or to add the footnotes approprately - with some human intervention perhaps.
hence the need for a new project (if such a project doesnt exist already).
i would like to use python for the job, but the usual pdf related modules ReportLab and pyPdf canot import existing pdf pages. Can someone help out with the search for such a python module?

pypdf has received a lot of updates in the past year. Do you want to try again? — Martin Thoma, Feb 11 '23 at 08:58

open source project for pdf/djvu/or-other-image-format text reflow -- formatting books to kindle or other e-readers

0 Answers0