Home » Html » Download html page and its content

Download html page and its content

Posted by: admin November 30, 2017 Leave a comment

Questions:

Does python have any way of downloading entire html page and its contents (images, css) to local folder given a url. And updating local html file to pick content locally.

Answers:

You can use the urllib module to download individual URLs but this will just return the data. It will not parse the HTML and automatically download things like CSS files and images.

If you want to download the “whole” page you will need to parse the HTML and find the other things you need to download. You could use something like Beautiful Soup to parse the HTML you retrieve.

This question has some sample code doing exactly that.

Questions:
Answers:

What you’re looking for is a mirroring tool. If you want one in Python, PyPI lists spider.py but I have no experience with it. Others might be better but I don’t know – I use ‘wget’, which supports getting the CSS and the images. This probably does what you want (quoting from the manual)

Retrieve only one HTML page, but make
sure that all the elements needed for
the page to be displayed, such as
inline images and external style
sheets, are also downloaded. Also make
sure the downloaded page references
the downloaded links.

wget -p --convert-links http://www.server.com/dir/page.html

Questions:
Answers:

You can use the urlib:

import urllib.request

opener = urllib.request.FancyURLopener({})
url = "http://stackoverflow.com/"
f = opener.open(url)
content = f.read()