I’m helping maintain a program that’s essentially a friendly read-only front-end for a big and complicated MySQL database — the program builds ad-hoc SELECT queries from users’ input, sends the queries to the DB, gets the results, post-processes them, and displays them nicely back to the user.
I’d like to add some form of reasonable/heuristic prediction for the constructed query’s expected performance — sometimes users inadvertently make queries that are inevitably going to take a very long time (because they’ll return huge result sets, or because they’re “going against the grain” of the way the DB is indexed) and I’d like to be able to display to the user some “somewhat reliable” information/guess about how long the query is going to take. It doesn’t have to be perfect, as long as it doesn’t get so badly and frequently out of whack with reality as to cause a “cry wolf” effect where users learn to disregard it;-) Based on this info, a user might decide to go get a coffee (if the estimate is 5-10 minutes), go for lunch (if it’s 30-60 minutes), kill the query and try something else instead (maybe tighter limits on the info they’re requesting), etc, etc.
I’m not very familiar with MySQL’s EXPLAIN statement — I see a lot of information around on how to use it to optimize a query or a DB’s schema, indexing, etc, but not much on how to use it for my more limited purpose — simply make a prediction, taking the DB as a given (of course if the predictions are reliable enough I may eventually switch to using them also to choose between alternate forms a query could take, but, that’s for the future: for now, I’d be plenty happy just to show the performance guesstimates to the users for the above-mentioned purposes).
EXPLAIN won’t give you any indication of how long a query will take.
At best you could use it to guess which of two queries might be faster, but unless one of them is obviously badly written then even that is going to be very hard.
You should also be aware that if you’re using sub-queries, even running EXPLAIN can be slow (almost as slow as the query itself in some cases).
As far as I’m aware, MySQL doesn’t provide any way to estimate the time a query will take to run. Could you log the time each query takes to run, then build an estimate based on the history of past similar queries?
I think if you want to have a chance of building something reasonably reliable out of this, what you should do is build a statistical model out of table sizes and broken-down EXPLAIN result components correlated with query processing times. Trying to build a query execution time predictor based on thinking about the contents of an EXPLAIN is just going to spend way too long giving embarrassingly poor results before it gets refined to vague usefulness.
MySQL EXPLAIN has a column called
Key. If there is something in this column, this is a very good indication, it means that the query will use an index.
Queries that use indicies are generally safe to use since they were likely thought out by the database designer when (s)he designed the database.
There is another field called
Extra. This field sometimes contains the text
This is very very bad. This literally means MySQL knows that the query will have a result set larger than the available memory, and therefore will start to swap the data to disk in order to sort it.
Instead of trying to predict the time a query takes, simply look at these two indicators. If a query is
using_filesort, deny the user. And depending on how strict you want to be, if the query is not using any keys, you should also deny it.
Read more about the resultset of the MySQL EXPLAIN statement