Home » Php » php ZipArchive check if Zip file is broken/incomplete

php ZipArchive check if Zip file is broken/incomplete

Posted by: admin July 12, 2020 Leave a comment

Questions:

My users upload zip files through FTP, then a php file adds them to a RSS file.

I’m trying to find a way to check each ZIP files to validate the file and check if it is broken or if the upload is unfinished. Is there a way to do that ?

How to&Answers:

The result from open can be also be true, which should be evaluated first. Without the check ZipArchive:ER_NOZIP, which equals (int) 1, will always match.

$zip = new ZipArchive();
$res = $zip->open('test.zip', ZipArchive::CHECKCONS);
if ($res !== TRUE) {
    switch($res) {
        case ZipArchive::ER_NOZIP:
            die('not a zip archive');
        case ZipArchive::ER_INCONS :
            die('consistency check failed');
        case ZipArchive::ER_CRC :
            die('checksum failed');
        default:
            die('error ' . $res);
    }
}

Answer:

You can use the ZipArchive class for this. Since PHP5.2 it is part of the standard php distribution.

Use it like this:

$zip = new ZipArchive();

// ZipArchive::CHECKCONS will enforce additional consistency checks
$res = $zip->open('test.zip', ZipArchive::CHECKCONS);
switch($res) {

    case ZipArchive::ER_NOZIP :
        die('not a zip archive');
    case ZipArchive::ER_INCONS :
        die('consistency check failed');
    case ZipArchive::ER_CRC :
        die('checksum failed');

    // ... check for the other types of errors listed in the manual
}

If the zip archive is incomplete or broken in other ways $zip->open() will return ZipArchive::ER_NOZIP

Answer:

How to detect corrupt files with CRC mismatch:

ZipArchive seems unable to detect broken files. ZipArchive::CHECKCONS doesn’t help, only if it’s not a ZIP file at all. It happily decompressed corrupt files in my tests and the client downloading the data is not informed.

Creating a corrupt archive for testing is simple – zip some files and change a byte with a hex editor in the resulting ZIP file. Now you can test the file with a ZIP application to learn which file inside the archive is corrupt.

You can simply verify CRCs on the server for smaller files:

<?php
$maxsize = 1024*1024;
$z = new ZipArchive;
$r = $z->open("foo.zip", ZipArchive::CHECKCONS);
if($r !== TRUE)
  die('ZIP error when trying to open "foo.zip": '.$r);

$stat = $z->statName("mybrokenfile.txt");
if($stat['size'] > $maxsize)
  die('File too large, decompression denied');
$s = $z->getStream($file);
$data = stream_get_contents($s, $maxsize);
fclose($s);
if($stat['crc'] != crc32($data))
  die('File is corrupt!');
//echo 'File is valid';

//you may send the file to the client now if you didn't output anything before
header('Content-Description: File Transfer');
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename="mybrokenfile.txt"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . $stat['size']);
ob_clean();
echo $data;
$z->close();
?>

If the file shall not be fully decompressed on the server but decompressed while streaming to the client due to it’s size, the file transfer already startet and printing an error message later doesn’t work. Maybe the best way would be to interrupt the connection before closing the file transfer. The client should be able to detect this as corrupt download.
On the server side a function is needed, that can calculate the CRC32 on streamed data stepwise.