A few minutes ago, I asked whether it was better to perform many queries at once at log in and save the data in sessions, or to query as needed. I was surprised by the answer, (to query as needed). Are there other good rules of thumb to follow when building PHP/MySQL multi-user apps that speed up performance?
I’m looking for specific ways to create the most efficient application possible.
know your hashes (arrays/tables/ordered maps/whatever you call them). a hash lookup is very fast, and sometimes, if you have O(n^2) loops, you may reduce them to O(n) by organizing them into an array (keyed by primary key) first and then processing them.
foreach ($results as $result) if (in_array($result->id, $other_results) $found++;
is slow –
in_array loops through the whole
$other_result, resulting in O(n^2).
foreach ($other_results as $other_result) $hash[$other_result->id] = true; foreach ($results as $result) if (isset($hash[$result->id])) $found++;
the second one is a lot faster (depending on the result sets – the bigger, the faster), because isset() is (almost) constant time. actually, this is not a very good example – you could do this even faster using built in php functions, but you get the idea.
mysql.conf: i don’t have any idea how much performance you can gain by optimizing your mysql configuration instead of leaving the default. but i’ve read you can ignore every postgresql benchmark that used the default configuration. afaik with configuration matters less with mysql, but why ignore it? rule of thumb: try to fit the whole database into memory 🙂
explain [query]: an obvious one, a lot of people get wrong. learn about indices. there are rules you can follow, you can benchmark it and you can make a huge difference. if you really want it all, learn about the different types of indices (btrees, hashes, …) and when to use them.
caching is hard, but if done right it makes the difference (not a difference). in my opinion: if you can live without caching, don’t do it. it often adds a lot of complexity and points of failures. google did a bit of proxy caching once (to make the intertubes faster), and some people saw private information of others.
in php, there are 4 different kinds of caching people regulary use:
query caching: almost always translates to memcached (sometimes to APC shared memory). store the result set of a certain query to a fast key/value (=hashing) storage engine. queries (now lookups) become very cheap.
output caching: store your generated html for later use (instead of regenerating it every time). this can result in the biggest speed-ups, but somewhat works against PHPs dynamic nature.
browser caching: what about etags and http responses? if done right you may avoid most of the work right at the beginning! most php programmers ignore this option because they have no idea what HTTP is.
opcode caching: APC, zend optimizer and so on. makes php code load faster. can help with big applications. got nothing to do with (slow) external datasources though, and the potential is somewhat limited.
sometimes it’s not possible to live without caches, e.g. if it comes to thumbnails. image resizing is very expensive, but fortunatley easy to control (most of the time).
xdebug shows you the bottlenecks of your application. if your app is too slow, it’s helpful to know why.
queries in loops
there are (php-)experts who do not know what a join is (and for every one you educate, two new ones without that knowledge will surface – and they will write frameworks, see schnalles law). sometimes, those queries-in-loops are not that obvious, e.g. if they come with libraries. count the queries – if they grow with the results shown, there is something wrong.
inexperienced developers do have a primal, insatiable urge to write frameworks and content management systems
Optimize your MySQL queries first, then the PHP that handles it, and then lastly cache the results of large queries and searches. MySQL is, by far, the most frequent bottleneck in an application. A poorly designed query can take two to three times longer than a well designed query that only selects needed information.
Therefore, if your queries are optimized before you cache them, you have saved a good deal of processing time.
However, on some shared hosts caching is file-system only thanks to a lack of Memcached. In this instance it may be better to run smaller queries than it is to cache them, as the seek time of the hard drive (and waiting for access due to other sites) can easily take longer than the query when your site is under load.
- Speedy indexed queries.