Looking at a query’s
EXPLAIN plan, how does one determine where optimisations can best be made?
I appreciate that one of the first things to check is whether good indexes are being used, but beyond that I’m a little stumped. Through trial and error in the past I have sometimes found that the order in which joins are conducted can be a good source of improvement, but how can one determine that from looking at the execution plan?
Whilst I would very much like to gain a good general understanding of how to optimise queries (suggested reading much appreciated!), I also realise that it’s often easier to discuss concrete cases than talk in the abstract. Since I am currently banging my head against the wall with this one, your thoughts would be much appreciated:
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE S const PRIMARY,l,p,f4 PRIMARY 2 const 1 Using temporary 1 SIMPLE Q ref PRIMARY,S S 2 const 204 Using index 1 SIMPLE V ref PRIMARY,n,Q Q 5 const,db.Q.QID 6 Using where; Using index; Distinct 1 SIMPLE R1 ref PRIMARY,L L 154 const,db.V.VID 447 Using index; Distinct 1 SIMPLE W eq_ref PRIMARY,w PRIMARY 5 const,db.R.RID,const 1 Using where; Distinct 1 SIMPLE R2 eq_ref PRIMARY,L PRIMARY 156 const,db.W.RID,const 1 Using where; Distinct
Am I correct in interpreting the final row of the execution plan as follows:
- as it is fully matched on its primary key, only one row of
R2need be fetched per output row;
- however, such output rows are then filtered based on some criteria that applies to
If so, my problem lies in the filtering which occurs in that final step. If the condition results in no filtering (e.g.
WHERE `Col_1_to_3` IN (1,2,3)), the query runs extremely quickly (~50ms); however, if the condition restricts the rows selected (
WHERE `Col_1_to_3` IN (1,2)), the query takes considerably longer (~5s). If the restriction is to a single match (
WHERE `Col_1_to_3` IN (1)), the optimiser suggests an altogether different execution plan (which performs marginally better than 5s, but still a lot worse than 50ms). It doesn’t seem as though there’s a better index that can be used on that table (given it’s already fully using the primary key to return one row per result?).
How should one interpret all this information? Am I right in guessing that, because such output filtering is taking place on the final table to be joined, considerable effort is wasted versus joining the table earlier and filtering such rows sooner? If so, how does one determine when in the execution plan
R2 ought to be joined?
Whilst I resisted including the query & schema in full here (as I would really likely to know what to look for, not merely be told the answer), I understand it’s necessary to advance the discussion:
SELECT DISTINCT `Q`.`QID` FROM `S` NATURAL JOIN `Q` NATURAL JOIN `V` NATURAL JOIN `R` AS `R1` NATURAL JOIN `W` JOIN `R` AS `R2` ON ( `R2`.`SID` = `S`.`SID` AND `R2`.`RID` = `R1`.`RID` AND `R2`.`VID` = `S`.`V_id` AND `R2`.`Col_1_to_3` IN (1,2) -- this is where performance suffers! ) WHERE AND `S`.`SID` = @x AND `W`.`WID` = @y ;
The definition of table
CREATE TABLE `R` ( `SID` smallint(6) unsigned NOT NULL, `RID` smallint(6) unsigned NOT NULL, `VID` varchar(50) NOT NULL DEFAULT '', `Col_1_to_3` smallint(1) DEFAULT NULL, `T` varchar(255) DEFAULT NULL, PRIMARY KEY (`SID`,`RID`,`VID`), KEY `L` (`SID`,`VID`,`Col_1_to_3`), CONSTRAINT `R_f1` FOREIGN KEY (`SID`) REFERENCES `S` (`SID`), CONSTRAINT `R_f2` FOREIGN KEY (`SID`, `VID`) REFERENCES `V` (`SID`, `VID`), CONSTRAINT `R_f3` FOREIGN KEY (`SID`, `VID`, `Col_1_to_3`) REFERENCES `L` (`SID`, `VID`, `LID`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
Depends on what you’re going for and what the query is.
Generally, for every line in EXPLAIN that has a
Using where, you need to have it using an index (
possible keys and
keys column). These are your filters and include WHERE and ON. Having it say
Using index is even better. It means there’s a covering index, and MySQL can retrieve the data right from the index rather than visiting the row in the table data.
The lines where there is no
Using where, and it is returning a large number of rows should be looked at. These are returning values for all rows in the table. I don’t know what your query is, so I don’t know whether to be alarmed here. Try filtering the result set to reduce the size and improve performance.
You generally should try to avoid seeing
Using filesort or
Using temporary, though those are only bad if you’re not expecting them.
Filesort usually appears with the ORDER clause. You generally want MySQL to use a covering index (
Using index) so that the rows are returned already in order from the server. If they’re not, then MySQL must order them afterward, using filesort.
Using temporary can be bad when it refers to derived tables because they don’t have indexes. It seems that you’ve explicitly created a temporary table with indexes, so here, it’s not bad. Sometimes, your only choice is to use a derived table, and hence