Home » Php » php – Bulk Compress (Zip) files

php – Bulk Compress (Zip) files

Posted by: admin July 12, 2020 Leave a comment

Questions:

Use:
Our users have many objects in our AWS S3 account. We are adding a feature to download entire projects at once. We are more concerned with efficiency than with storage.

After looking at different options (ZipArchive, PclZip) I came across this guide recommending the use of Chilkat.

It’s method makes a lot of sense, and summarized it is as follows:

  • Prezip each file on upload and store it in S3
  • “Project Download” starts downloading each compressed file then QuickAppend (Chilkat terminology) which then “instantly” (200ms per file) adds them to the overall compressed file
  • Upload new Zip file to S3, provide link

The issue I am running into is a license for Chilkat is $249, and I am looking for free alternatives.

An alternative (also free) uses a similar concept:

  • Prezip each file on upload and store it in S3
  • “Project Download” starts downloading each compressed file then tar‘s them together
  • Upload new Zip file to S3, provide link

Is there a “standard” or “ideal” way for dealing with this?

How to&Answers:

On my local system PHP’s built-in zip library is able to merge a 10 file 24MB zip into a 21 file 51MB zip in about 800ms, which is comparable to the 200ms/file you reported but I’m not sure how large your files are or what type of hardware you’re using.

Unlike the Java library that the author of your guide initially used, PHP’s zip library is implemented in C, so you won’t see the same Java to C performance gains that the author saw. Having said that, I don’t know how Chillkat’s QuickAppend works or how it compares to PHP’s zip library but appending to pre-zipped files whether you do it with PHP or Chillkat does seem to be the fastest solution.

$destination = new ZipArchive;
$source = new ZipArchive;

if($source->open('a.zip') === TRUE 
&& $destination->open('b.zip') === TRUE) {

    $time_start = microtime(true);

    $temp_dir = "/tmp/zip_" . time();        
    mkdir($temp_dir,0777,true);
    $source->extractTo($temp_dir);
    $source->close();

    $files = scandir($temp_dir);
    $file_count = 0;

    foreach($files as $file) {
        if($file == '.' || $file == '..')
          continue;

        $destination->addFile("$temp_dir/$file");
        ++$file_count;
    }

    $destination->close();
    exec("rm -rf $temp_dir &");

    $time_end = microtime(true);
    $time = $time_end - $time_start;

    print "Added $file_count files in " . ($time * 1000). "ms \n";    
}

Output

-rw-rw-r-- 1 fuzzytree fuzzytree 24020997 Jun  4 15:57 a.zip
-rw-rw-r-- 1 fuzzytree fuzzytree 51418980 Jun  4 15:57 b.zip

[email protected]:~/testzip$ php zip.php 
Added 10 files in 872.43795394897ms

[email protected]:~/testzip$ ls -ltr *zip
-rw-rw-r-- 1 fuzzytree fuzzytree 24020997 Jun  4 15:57 a.zip
-rw-rw-r-- 1 fuzzytree fuzzytree 75443030 Jun  4 15:57 b.zip

Answer:

I have a site where people frequently download tens or even hundreds of files (as much as 100Mb, if I had to guess offhand) in one zip file. I use zipstream which I think I found here. I’m not sure of the limitations, but it seems to work well and there’s no need to zip the individual files beforehand.