4

I have a PDF file. I would to get it height and width in mm.

So I do an exec(pdfinfo ... ); I have this result :

Creator: Adobe InDesign CS5 (7.0.3) Producer: Acrobat Distiller 9.4.2 (Macintosh) CreationDate: Mon Jan 30 15:48:43 2012 ModDate: Fri Feb 10 10:35:05 2012 Tagged: no Pages: 34 Encrypted: no Page size: 552.744 x 708.643 pts File size: 80724791 bytes Optimized: yes PDF version: 1.3

I have a script witch extract my info :

<?php 
$output = shell_exec("pdfinfo ".$pdflivrelink);
$data = explode("\n", $output); //puts it into an array
for($c=0; $c < count($data); $c++) {
        if(stristr($data[$c],"Pages") == true) {
        $pagesnumber = trim(substr($data[$c],6));
        }
        if(stristr($data[$c],"Page size") == true) {
            $pagesize_H = height_pdf(trim(substr($data[$c],9)));
        }
        if(stristr($data[$c],"Page size") == true) {
            $pagesize_L = width_pdf(trim(substr($data[$c],9)));
        }

}
function height_pdf($size){
$hauteur = round(substr($size,7,7)/2.83);
return $hauteur;
}
function width_pdf($size){
$largeur = round(substr($size,17,7)/2.83);
return $largeur;
} ?>

It's OK, because I have three numbers dot three numbers (552.744 x 708.643). But, I don't know why, some PDF files have this info :

Creator: pdftk 1.41 - www.pdftk.com Producer: iText 2.1.5 (by lowagie.com) CreationDate: Mon Feb 27 13:18:23 2012 ModDate: Mon Feb 27 16:26:12 2012 Tagged: no Pages: 36 Encrypted: no Page size: 425.2 x 538.582 pts File size: 5097597 bytes Optimized: yes PDF version: 1.6

425.2 x 538.582 : So my script doesn't work!

Can you help me? thank a lot!


I test this :

    $output = shell_exec("pdfinfo ".$pdflivrelink);
    $data = explode("\n", $output); //puts it into an array
    for($c=0; $c < count($data); $c++) {
            if(stristr($data[$c],"Pages") == true) {
            $pagesnumber = trim(substr($data[$c],6));

            }
            if(stristr($data[$c],"Page size") == true) {
                echo $data[$c];
    preg_match('/Page size: ([0-9]*\.?[0-9]?) x ([0-9]*\.?[0-9]?)/', $data[$c], $matchess);
    $width = round($matchess[1]/2.83);
    $height = round($matchess[2]/2.83);

            }
}
echo "width = $width<br>height = $height";

it result :

Page size: 425.2 x 538.582 ptswidth = 0 height = 0

Seb Gy
  • 140
  • 1
  • 5
  • 15

6 Answers6

6

A little regex will get you the correct results.

<?php
$str = 'Creator: pdftk 1.41 - www.pdftk.com Producer: iText 2.1.5 (by lowagie.com) CreationDate: Mon Feb 27 13:18:23 2012 ModDate: Mon Feb 27 16:26:12 2012 Tagged: no Pages: 36 Encrypted: no Page size: 425.2 x 538.582 pts File size: 5097597 bytes Optimized: yes PDF version: 1.6';

preg_match('/Page size: ([0-9]*\.?[0-9]?) x ([0-9]*\.?[0-9]?)/', $str, $matches);
$width = round($matches[1]/2.83);
$height = round($matches[2]/2.83);

echo "width = $width<br>height = $height";
?>

Update ( asked for more details ) : Complete working example below. I've updated Regex to match real output from pdfinfo

<?php

$output = shell_exec("pdfinfo ".$pdflivrelink);

// find page count
preg_match('/Pages:\s+([0-9]+)/', $output, $pagecountmatches);
$pagecount = $pagecountmatches[1];

// find page sizes
preg_match('/Page size:\s+([0-9]{0,5}\.?[0-9]{0,3}) x ([0-9]{0,5}\.?[0-9]{0,3})/', $output, $pagesizematches);
$width = round($pagesizematches[1]/2.83);
$height = round($pagesizematches[2]/2.83);

echo "pagecount = $pagecount <br>width = $width<br>height = $height";

?>
AndrewR
  • 6,668
  • 1
  • 24
  • 38
  • Thanks for your help! I have width = 0 height = 0 – Seb Gy Mar 08 '12 at 18:32
  • Because you are still doing the `$data = split()` line. If you rant he regex directly on $output, that should be all you need to do. You can get rid of the entire loop if you combine this with the other answer's regex to get page num. – AndrewR Mar 09 '12 at 15:59
  • Can you more explain your idea? I don't all understand, thanks – Seb Gy Mar 09 '12 at 21:18
  • @AndrewR Thank you very much for your valuable answer .... would you help me to get px from this dimension ...? – Nadimul De Cj Feb 11 '16 at 04:44
2

Using Fpdi, noting the use of getTemplateSize it's...

const INCHESTOMM = 25.4;

public static function getPDFdimensions($strFilename): array
{
    $pdf1 = new FPDI('P', 'in');
    $pdf1->setSourceFile($strFilename);
    $tplIdx1 = $pdf1->importPage(1);
    $size = $pdf1->getTemplateSize($tplIdx1);
    $w = $size["width"];
    $h = $size["height"];
    return [round($w * self::INCHESTOMM), round($h * self::INCHESTOMM)];
}
2

Do it with a preg_match():

// Debugging:
$output = shell_exec("pdfinfo ".$pdflivrelink);
var_dump($output);

// Dimension:
preg_match('~ Page size: ([0-9\.]+) x ([0-9\.]+) pts ~', $output, $matches);
var_dump($matches);


// No of pages:
preg_match('~ Pages ([0-9]+) ~', $output, $matches);
var_dump($matches);
powtac
  • 40,542
  • 28
  • 115
  • 170
  • Thanks for your help! I have array(0) { } – Seb Gy Mar 08 '12 at 18:32
  • Not good. `$output` is `$output = shell_exec("pdfinfo ".$pdflivrelink);` ? – powtac Mar 08 '12 at 18:34
  • yes, when I do $output = shell_exec("pdfinfo ".$pdflivrelink); I haven't result, but when I do $output ="the text..." it's result : array(3) { [0]=> string(32) " Page size: 425.2 x 538.582 pts " [1]=> string(5) "425.2" [2]=> string(7) "538.582" } – Seb Gy Mar 08 '12 at 18:38
  • I do this : preg_match('~ Page size: ([0-9\.]+) x ([0-9\.]+) pts ~', shell_exec("pdfinfo ".$pdflivrelink), $matches); var_dump($matches); same result : array(0) { } – Seb Gy Mar 08 '12 at 18:40
  • Try the "No of pages:" pattern. – powtac Mar 08 '12 at 18:53
  • Does it work when you use this `$output = 'Creator: pdftk 1.41 - www.pdftk.com Producer: iText 2.1.5 (by lowagie.com) CreationDate: Mon Feb 27 13:18:23 2012 ModDate: Mon Feb 27 16:26:12 2012 Tagged: no Pages: 36 Encrypted: no Page size: 425.2 x 538.582 pts File size: 5097597 bytes Optimized: yes PDF version: 1.6';` – powtac Mar 08 '12 at 18:57
  • Then there is something wrong with `shell_exec("pdfinfo ".$pdflivrelink)` – powtac Mar 09 '12 at 12:05
  • I also deducting that. How shall we do it ? – Seb Gy Mar 09 '12 at 12:53
  • It give me : string(352) "Title: unknown Creator: Adobe InDesign CS5.5 (7.5) Producer: Adobe PDF Library 9.9 CreationDate: Tue Jan 31 17:05:25 2012 ModDate: Fri Feb 10 10:42:57 2012 Tagged: yes Pages: 34 Encrypted: no Page size: 581.108 x 793.7 pts File size: 31374145 bytes Optimized: yes PDF version: 1.3 " (It's about a new PDF file don't worry) – Seb Gy Mar 09 '12 at 14:39
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/8716/discussion-between-seb-gouaille-and-powtac) – Seb Gy Mar 09 '12 at 14:43
1

Why not use plain PHP to get the pdf dimensions?

<?php
function get_pdf_dimensions($path, $box="MediaBox") {
    //$box can be set to BleedBox, CropBox or MediaBox 

    $stream = new SplFileObject($path); 

    $result = false;

    while (!$stream->eof()) {
        if (preg_match("/".$box."\[[0-9]{1,}.[0-9]{1,} [0-9]{1,}.[0-9]{1,} ([0-9]{1,}.[0-9]{1,}) ([0-9]{1,}.[0-9]{1,})\]/", $stream->fgets(), $matches)) {
            $result["width"] = $matches[1];
            $result["height"] = $matches[2]; 
            break;
        }
    }

    $stream = null;

    return $result;
}

var_dump(get_pdf_dimensions("file.pdf"));
fltman
  • 135
  • 1
  • 2
-1

Imagick library can be used to get the dimensions of file

 $image = new Imagick($file);
 $geo=$image->getImageGeometry();
 $width=$geo['width'];
 $height=$geo['height'];

If imagick library is not installed, Ubuntu users can use the following command to install it:

 sudo apt-get install php-imagick
 php -m | grep imagick
 sudo service apache2 restart
-3

Since you know the format of the size string, you can also do it like below. (This function returns width and height in an array.)

function size_pdf($size){
    $result = array();
    $tmp = exlode('x', $size);
    $result['height'] = round(trim($tmp[0])/2.83);
    $result['width'] = round(trim($tmp[1])/2.83);

    return $result;
}
adidasadida
  • 120
  • 2