1

I need to convert some pdf files into HTML. I downloaded pdftohtml for PHP but I don't know how to use it. I am trying to run it with this code:

<?php  
    include 'pdf-to-html-master/src/Gufy/PdfToHtml.php';
    $pdf = new \Gufy\PdfToHtml;
    $pdf->open('1400.pdf');
    $pdf->generate();
?>

This results in a blank web page.

What do I need to modify? What is the correct code to run this script?

Sunil D.
  • 17,983
  • 6
  • 53
  • 65
silvia
  • 49
  • 1
  • 3
  • 8
  • I hope you installed poppler-utils before using this code. `sudo apt-get install poppler-utils` – varunsinghal Jul 09 '15 at 08:00
  • my cmd tell me that he don't know command sudo – silvia Jul 09 '15 at 08:09
  • this command is for linux machines, since you are on windows check here. http://blog.alivate.com.au/poppler-windows/ – varunsinghal Jul 09 '15 at 08:12
  • Ok, I've my directory of poppler for windows but how I use it.. I don't understand.. sorry but now if I run the php file the results is again a white page @varunsinghal – silvia Jul 09 '15 at 08:17
  • is the file being used to convert the pdf that is the file you wrote above is in same directory as the pdf? – varunsinghal Jul 09 '15 at 08:27

4 Answers4

2

First option is using poppler utils

<?php
// if you are using composer, just use this
include 'vendor/autoload.php';
// if not, use this
include 'src/Gufy/PdfToHtml.php';
// initiate 
$pdf = new \Gufy\PdfToHtml;
// opening file
$pdf->open('file.pdf');
// set different output directory for generated html files
$pdf->setOutputDirectory('/your/absolute/directory/path');
// do this if you want to convert in the same directory as file.pdf
$pdf->generate();
// you think your generated files is annoying? simple do this to remove the whole files
$pdf->clearOutputDirectory();
?>

Download library from here Second option could be using pdf.js

PDFJS.getDocument('helloworld.pdf')
varunsinghal
  • 329
  • 4
  • 12
  • how I put my pdf.js directory? And It's PDFJS.getDocument('helloworld.pdf') the only istruction to write on php file? – silvia Jul 09 '15 at 08:12
  • here are the examples on how to use pdf.js https://mozilla.github.io/pdf.js/examples/ – varunsinghal Jul 09 '15 at 08:29
  • ok I do it but I've this error: `Notice: Use of undefined constant PDFJS - assumed 'PDFJS' in C:\xampp\htdocs\parserpdfprova\prova.php on line 3 Fatal error: Call to undefined function getDocument() in C:\xampp\htdocs\parserpdfprova\prova.php on line 3` – silvia Jul 09 '15 at 08:34
  • Ok but If the pdf is on the other web site? – silvia Jul 09 '15 at 08:52
  • I do in this way but I'va an error: `PDF.js v1.1.114 (build: 3fd44fd) Message: Unexpected server response (0) while retrieving PDF` but I insert this url: http://provapdftohtml.altervista.org/web/viewer.html?file=http://web2.rrbc.org.au/wp-content/uploads/2015/06/Newsletter-June-2015.pdf – silvia Jul 09 '15 at 09:15
  • okay, this was expected. the error is mentioned in their FAQ https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#can-i-load-a-pdf-from-another-server-cross-domain-request – varunsinghal Jul 09 '15 at 09:24
  • They have mentioned in that reply that we can use CORS to do the same. http://colonelpanic.net/2014/08/using-pdf-js-web-worker-cross-domain-cors/ – varunsinghal Jul 09 '15 at 09:24
  • 1
    ok so I think to use a jquery script but however thank you for helping me – silvia Jul 09 '15 at 09:36
0

I'm the maintainer of the package. The package has updated. Have you already used the latest version? And, if you're using Windows, please read again the doc. Also, please do not download directly from github, use composer instead.

0
include 'vendor/autoload.php';

use Gufy\PdfToHtml\Pdf;
    use PHPHtmlParser\Dom;
    use DateTime;

public function parsepdf(Request $request) {

    $pdf = new Pdf($request->file('csv_file'));
    $html = $pdf->html();
    $dom = new Dom;
    $total_pages = $pdf->getPages();

    if ($total_pages == 1) {
        $html->goToPage(1);            
        $dom->load($html);
        $paragraphs = $dom->find('p');
        $paragraphs = collect($paragraphs);
        foreach($paragraphs as $p){
           $datestring = preg_replace('/\xc2\xa0/', ' ', trim($p->text));
           echo $datestring;
        }
  }

Above code for Convert pdf to html in laravel

composer require gufy/pdftohtml-php:~2

Poppler-Utils (if you are using Ubuntu Distro, just install it from apt ) sudo apt-get install poppler-utils

Sundar
  • 253
  • 2
  • 6
-2

I use wkhtmltopdf and it works okay. You can download it from here: http://wkhtmltopdf.org/downloads.html

I installed it in Linux and I use it like this:

$url = "https://www.google.com";

$command = "/usr/bin/wkhtmltopdf --load-error-handling ignore --disable-smart-shrinking -T 5mm -B 5mm -L 2mm -R 2mm  --page-size Letter --encoding utf-8 --quiet";

$filename = '[file path].pdf';
if (file_exists($filename)) {
   unlink($filename);
}

$output = shell_exec($command . " $url " . $filename);

echo $output;

Hope this helps.