Home » Html » Python check if website exists

Python check if website exists

Posted by: admin November 30, 2017 Leave a comment

Questions:

I wanted to check if a certain website exists, this is what I’m doing:

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent':user_agent }
link = "http://www.abc.com"
req = urllib2.Request(link, headers = headers)
page = urllib2.urlopen(req).read() - ERROR 402 generated here!

If the page doesn’t exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I’m reading does exit?

Answers:

You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
   print('web site exists')

or you can use urllib2

import urllib2
try:
    urllib2.urlopen('http://www.example.com/some_page')
except urllib2.HTTPError, e:
    print(e.code)
except urllib2.URLError, e:
    print(e.args)

or you can use requests

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist') 

Questions:
Answers:

It’s better to check that status code is < 400, like it was done here. Here is what do status codes mean (taken from wikipedia):

  • 1xx – informational
  • 2xx – success
  • 3xx – redirection
  • 4xx – client error
  • 5xx – server error

If you want to check if page exists and don’t want to download the whole page, you should use Head Request:

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert int(resp[0]['status']) < 400

taken from this answer.

If you want to download the whole page, just make a normal request and check the status code. Example using requests:

import requests

response = requests.get('http://google.com')
assert response.status_code < 400

See also similar topics:

Hope that helps.

Questions:
Answers:
from urllib2 import Request, urlopen, HTTPError, URLError

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent':user_agent }
link = "http://www.abc.com/"
req = Request(link, headers = headers)
try:
        page_open = urlopen(req)
except HTTPError, e:
        print e.code
except URLError, e:
        print e.reason
else:
        print 'ok'

To answer the comment of unutbu:

Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.
Source

Questions:
Answers:

code:

a="http://www.example.com"
try:    
    print urllib.urlopen(a)
except:
    print a+"  site does not exist"

Questions:
Answers:
def isok(mypath):
    try:
        thepage = urllib.request.urlopen(mypath)
    except HTTPError as e:
        return 0
    except URLError as e:
        return 0
    else:
        return 1

Leave a Reply

Your email address will not be published. Required fields are marked *