This question already has an answer here:
Short answer (simplified). To get one file
import urllib urllib.urlretrieve("http://google.com/index.html", filename="local/index.html")
You can figure out how to loop that if necessary.
If you use
os.system() to spawn a process for the
wget, it will block until
wget finishes the download (or quits with an error). So, just call
os.system('wget blah') in a loop until you’ve downloaded all of your files.
Alternatively, you can use
httplib. You’ll have to write a non-trivial amount code, but you’ll get better performance, since you can reuse a single HTTP connection to download many files, as opposed to opening a new connection for each file.
No reason to use os.system. Avoid writing a shell script in Python and go with something like urllib.urlretrieve or an equivalent.
Edit… to answer the second part of your question, you can set up a thread pool using the standard library Queue class. Since you’re doing a lot of downloading, the GIL shouldn’t be a problem. Generate a list of the URLs you wish to download and feed them to your work queue. It will handle pushing requests to worker threads.
I’m waiting for a database update to complete, so I put this together real quick.
#!/usr/bin/python import sys import threading import urllib from Queue import Queue import logging class Downloader(threading.Thread): def __init__(self, queue): super(Downloader, self).__init__() self.queue = queue def run(self): while True: download_url, save_as = queue.get() # sentinal if not download_url: return try: urllib.urlretrieve(download_url, filename=save_as) except Exception, e: logging.warn("error downloading %s: %s" % (download_url, e)) if __name__ == '__main__': queue = Queue() threads =  for i in xrange(5): threads.append(Downloader(queue)) threads[-1].start() for line in sys.stdin: url = line.strip() filename = url.split('/')[-1] print "Download %s as %s" % (url, filename) queue.put((url, filename)) # if we get here, stdin has gotten the ^D print "Finishing current downloads" for i in xrange(5): queue.put((None, None))
Install wget via pypi http://pypi.python.org/pypi/wget/0.3
pip install wget
then run, just as documented
python -m wget <url>
No reason to use python. Avoid writing a shell script in Python and go with something like bash or an equivalent.