Home » Php » caching – How to Archive a Dynamic (PHP) Website as Static HTML?

caching – How to Archive a Dynamic (PHP) Website as Static HTML?

Posted by: admin July 12, 2020 Leave a comment

Questions:

We’re in the process of shutting down The Conversations Network (including the IT Conversations podcast). The plan is to render a static-HTML version of our websites for permanent hosting at the Internet Archive.

What’s the easiest way to generate static HTML from the roughly 5,000 dynamic pages currently generated dynamically from PHP?

I know we could tweak the code to cache the PHP output, write it to files, then walk the sitemaps to generate every page. But I wonder if there are any options we should consider. Any tools for doing this and scraping the HTML as-is? (Something other than Acrobat Pro?)

Unfortunately, we also have a fair number of Ajax calls, which are going to make this more difficult. I imagine we’ll have to un-Ajax them first.

How to&Answers:

There is a great piece of software called “Teleport Pro” (payware unfortunately), and it can create browsable/duplicated copies of a website. Which, once uploaded to a server, should work exactly the same as the original site.

Things to keep in mind though when your creating static html from dynamic pages are;

  • Your current ajax calls need to be un-ajaxed (as you said yourself)
  • .htaccess settings, mod_rewrite for example can make your static files worthless. Because links might not work.

But “Teleport pro” is a real solid program which is around for quite some time. I have used it in the past and will probably use it again.


Another approach might be the php module “php-apc” which creates a cache. In this case u would need to crawl the whole site, before a complete cache is created. Im not TOO familiar with it, but an install is easily done, and you could see if the generated files are of any use.

Answer:

It might not be what you are looking for; but HTTrack will browse your website for links and save the HTML-version of it. This mirror will include all static content that is linked, such as images, css and javascript.

The only problem I can think of is if your AJAX-script is pulling vital data from a server that, but perhaps HTTrack has a setting for that.