Home » Php » Count the number of pages in a PDF in only PHP

Count the number of pages in a PDF in only PHP

Posted by: admin April 23, 2020 Leave a comment

Questions:

I need a way to count the number of pages of a PDF in PHP. I’ve done a bit of Googling and the only things I’ve found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?

How to&Answers:

You can use the ImageMagick extension for PHP. ImageMagick understands PDF’s, and you can use the identify command to extract the number of pages. The PHP function is Imagick::identifyImage().

Answer:

If using Linux, this is much faster than using identify to get the page count (especially with a high number of pages):

exec('/usr/bin/pdfinfo '.$tmpfname.' | awk \'/Pages/ {print $2}\'', $output);

You do need pdfinfo installed.

Answer:

I know this is pretty old… but if it’s relevant to me now, it can be relevant to others too.

I just worked out this method of getting page numbers, as the methods listed here are inefficient and extremely slow for large PDFs.

$im = new Imagick();
$im->pingImage('name_of_pdf_file.pdf');
echo $im->getNumberImages();

Seems to be working great for me!

Answer:

I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:

Code:

function getNumPagesPdf($filepath){
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    if($max==0){
        $im = new imagick($filepath);
        $max=$im->getNumberImages();
    }

    return $max;
}

If it can’t figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.

Answer:

You could try fpdi (see here), as you can see when setting the sourcefile you get back the page numbers.

Answer:

Try this :

<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
        echo 'failed opening file '.$_REQUEST['file'];
}
else {
        $max=0;
        while(!feof($fp)) {
                $line = fgets($fp,255);
                if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                        preg_match('/[0-9]+/',$matches[0], $matches2);
                        if ($max<$matches2[0]) $max=$matches2[0];
                }
        }
        fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>

The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).

Answer:

this one does not use imagick:

function getNumPagesInPDF($file) 
{
    //http://www.hotscripts.com/forums/php/23533-how-now-get-number-pages-one-document-pdf.html
    if(!file_exists($file))return null;
    if (!$fp = @fopen($file,"r"))return null;
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    return (int)$max;

}

Answer:

function getNumPagesPdf($filepath) {
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "", $filepath), "r");
    $max = 0;
    if (!$fp) {
        return "Could not open file: $filepath";
    } else {
        while ([email protected]($fp)) {
            $line = @fgets($fp, 255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)) {
                preg_match('/[0-9]+/', $matches[0], $matches2);
                if ($max < $matches2[0]) {
                    $max = trim($matches2[0]);
                    break;
                }
            }
        }
        @fclose($fp);
    }

    return $max;
}

This does exactly what i want:

I just worked out this method of getting pdf page numbers…
after getting the pdf page count i just add break to the while so that it does not go in infinite loop here….

Answer:

On *nix environment you can use:

exec('pdftops ' . $filename . ' - | grep showpage | wc -l', $output);

Where pdftops should be installed as default.

Or as Xethron suggested:

pdfinfo filename.pdf | grep Pages: | awk '{print $2}'

Answer:

$pdftext = file_get_contents($caminho1);

 $num_pag = preg_match_all("/\/Page\W/", $pdftext,$dummy);

Answer:

Using only PHP can result in installing complicated libraries, restarting Apache etc. and many pure PHP-ways (like opening streams and using regex) are inaccurate.

The included answer is the only fast and reliable way I can think of. It uses a single executable though that doesn’t have to be installed (either *nix or Windows) and a simple PHP script extracts the output. The best thing is that I haven’t seen a wrong pagecount yet!

It can be found here, including why the other approaches “don’t work”:

Get the number of pages in a PDF document