For a website like reddit with lots of up/down votes and lots of comments per topic what should I go with?
Lighttpd/Php or Lighttpd/CherryPy/Genshi/SQLAlchemy?
and for database what would scale better / be fastest MySQL ( 4.1 or 5 ? ) or PostgreSQL?
I can’t speak to the MySQL/PostgreSQL question as I have limited experience with Postgres, but my Masters research project was about high-performance websites with CherryPy, and I don’t think you’ll be disappointed if you use CherryPy for your site. It can easily scale to thousands of simultaneous users on commodity hardware.
Of course, the same could be said for PHP, and I don’t know of any reasonable benchmarks comparing PHP and CherryPy performance. But if you were wondering whether CherryPy can handle a high-traffic site with a huge number of requests per second, the answer is definitely yes.
The ideal setup would be close to this:
In short, nginx is a fast and light webserver/front-proxy with a unique module that let’s it fetch data directly from memcached‘s RAM store, without hitting the disk, or any dynamic webapp. Of course, if the request’s URL wasn’t already cached (or if it has expired), the request proceeds to the webapp as usual. The genius part is that when the webapp has generated the response, a copy of it goes to memcached, ready to be reused.
All this is perfectly applicable not only to webpages, but to AJAX query/responses.
in the article the ‘back’ servers are http, and specifically talk about mongrel. It would be even better if the back were FastCGI and other (faster?) framework; but it’s a lot less critical, since the nginx/memcached team absorb the biggest part of the load.
note that if your url scheme for the AJAX traffic is well designed (REST is best, IMHO), you can put most of the DB right in memcached, and any POST (which WILL pass to the app) can preemptively update the cache.
On the DB question, I’d say PostgreSQL scales better and has better data integrity than MySQL. For a small site MySQL might be faster, but from what I’ve heard it slows significantly as the size of the database grows. (Note: I’ve never used MySQL for a large database, so you should probably get a second opinion about its scalability.) But PostgreSQL definitely scales well, and would be a good choice for a high traffic site.
Going to need more data. Jeff had a few articles on the same problems and the answer was to wait till you hit a performance issue.
to start with – who is hosting and what do they have available ? what’s your in house talent skill sets ? Are you going to be hiring an outside firm ? what do they recommend ? brand new project w/ a team willing to learn a new framework ?
2nd thing is to do some mockups – how is the interface going to work. what data does it need to load and persist ? the idea is to keep your traffic between the web and db side down. e.g. no chatty pages with lots of queries. etc.
Once you have a better idea of the data requirements and flow – then work on the database design. there are plenty of rules to follow but one of the better ones is to follow normalization rules (yea i’m a db guy why ?)
Now you have a couple of pages build – run your tests. are you having a problem ? Yes, now look at what is it. Page serving or db pulls ? Measure then pick a course of action.
I would go with nginx + php + xcache + postgresql