0

I was able to install TET (php_tet.dll) on Windows 8.1 + Xampp and I have no problems with PDF to Text, but I had no luck with image extraction.

I'm using the example "image resources.php" ( and "image_extractor.php" ) which is supposed to "print" some info about the images (x, y, width, height, alpha and e.t.c) in PDF file. Also must save/extract all available (or any) images into files (tiff, jpg).

The examples can be found here: http://goo.gl/ZeDlc0

The part with image information is working, but there is no files extracted.

I haven't got any trouble with text extraction to TXT file in the same folder. So I'm able to write there ?

Is something wrong with my SEARCHPATH or else ?

My TRY:

The original example throws ERROR:

Error 1016 in open_document(): Couldn't open PDF file 'FontReporter.pdf' for reading (file not found)

So I changed the SEARCHPATH:

/* global option list */
$globaloptlist = "searchpath={{../data} {../../data} }";

with the location of my pdf file:

/* global option list */
$globaloptlist = "searchpath={{D:\Workshop\www\TET\data} }";

Now I have some output data via print/echo:

page 7: 208x277pt, alpha=0, beta=0 id=0, 595x750 pixel, 1x8 bit Indexed 
page 7: 208x277pt, alpha=0, beta=0 id=1, 595x750 pixel, 1x8 bit Indexed

The $tet->write_image_file method returns 10 which says "I can extract TIFF file".

But no images are extracted in my pdf`s folder or anywhere around...

Sams
  • 684
  • 1
  • 8
  • 14

2 Answers2

0

Somehow the images are exported in D:\workshop\xampp\apache

In the option FILENAME I need to set the ABSOLUTE path and the filename...

$path = str_replace('\\', '/', __DIR__);

$imageoptlist = $baseimageoptlist . " filename {".$path."/out/" .
    $outfilebase . "_p" . $pageno . "_I" . $ti->imageid . "}";

    if ($tet->write_image_file($doc, $ti->imageid, $imageoptlist) == 0){
       print("Error " . $tet->get_errnum() . " in " .
          $tet->get_apiname() . "(): " . $tet->get_errmsg());
 }
Sams
  • 684
  • 1
  • 8
  • 14
0

this is exactly what I found in the TET manual, (chapter 3.9 "PHP" section):

File name handling in PHP 
Unqualified file names (without any path component) and relative file names are 
handled differently in Unix and Windows versions of PHP:
- PHP on Unix systems will find files without any path component in the directory
  where the script is located.
- PHP on Windows will find files without any path component only in the directory
  where the PHP DLL is located.

So I guess, it's expected that you have to adjust the sample slightly for your needs.

Rainer
  • 2,013
  • 1
  • 10
  • 7
  • it's kinda weird because when I save text ( extract text from pdf ) file without any path the file is created in the same dir as the PHP file, but with images without path I found them in my apache folder... but my DLL is in /PHP/ext/ which is not in the apache folder. After all thanks! – Sams Feb 17 '14 at 20:16