SELECT
s and Other Queries
EXPLAIN
Syntax (Get Information About a SELECT
)
SELECT
Queries
WHERE
Clauses
IS NULL
DISTINCT
LEFT JOIN
and RIGHT JOIN
ORDER BY
LIMIT
INSERT
Queries
UPDATE
Queries
DELETE
Queries
EXPLAIN
Syntax (Get Information About a SELECT
)EXPLAIN tbl_name or EXPLAIN SELECT select_options
EXPLAIN tbl_name
is a synonym for DESCRIBE tbl_name
or
SHOW COLUMNS FROM tbl_name
.
When you precede a SELECT
statement with the keyword EXPLAIN
,
MySQL explains how it would process the SELECT
, providing
information about how tables are joined and in which order.
With the help of EXPLAIN
, you can see when you must add indexes
to tables to get a faster SELECT
that uses indexes to find the
records.
You should frequently run ANALYZE TABLE
to update table statistics
such as cardinality of keys which can affect the choices the optimiser
makes. See section 4.5.2 ANALYZE TABLE
Syntax.
You can also see if the optimiser joins the tables in an optimal
order. To force the optimiser to use a specific join order for a
SELECT
statement, add a STRAIGHT_JOIN
clause.
For non-simple joins, EXPLAIN
returns a row of information for each
table used in the SELECT
statement. The tables are listed in the order
they would be read. MySQL resolves all joins using a single-sweep
multi-join method. This means that MySQL reads a row from the first
table, then finds a matching row in the second table, then in the third table
and so on. When all tables are processed, it outputs the selected columns and
backtracks through the table list until a table is found for which there are
more matching rows. The next row is read from this table and the process
continues with the next table.
In MySQL version 4.1 the EXPLAIN
output was changed to work better
with constructs like UNION
s, subqueries and derived tables. Most
notable is the addition of two new columns: id
and select_type
.
Output from EXPLAIN
consists of the following columns:
id
SELECT
identifier, the sequential number of this SELECT
within the query.
select_type
SELECT
clause, which can be any of the following:
SIMPLE
SELECT
(without UNION
s or subqueries).
PRIMARY
SELECT
.
UNION
UNION
SELECT
s.
DEPENDENT UNION
UNION
SELECTS
s, dependent on outer
subquery.
SUBQUERY
SELECT
in subquery.
DEPENDENT SUBQUERY
SELECT
, dependent on outer subquery.
DERIVED
SELECT
.
table
type
system
const
join type.
const
const
tables are very fast as they are read only once!
const
is used when you compare all parts of a
PRIMARY
/UNIQUE
key with constants:
SELECT * FROM const_table WHERE primary_key=1; SELECT * FROM const_table WHERE primary_key_part1=1 and primary_key_part2=2;
eq_ref
const
types. It is used when all parts of an index are used by
the join and the index is UNIQUE
or a PRIMARY KEY
.
eq_ref
can be used for indexed columns that is compared with
=
. The compared item may be a constant or an expression that uses
columns from tables that are read before this table.
In the following examples, ref_table
will be able to use eq_ref
SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;
ref
ref
is used if the join
uses only a leftmost prefix of the key, or if the key is not UNIQUE
or a PRIMARY KEY
(in other words, if the join cannot select a single
row based on the key value). If the key that is used matches only a few rows,
this join type is good.
ref
can be used for indexed columns that is compared with =
.
In the following examples, ref_table
will be able to use ref
SELECT * FROM ref_table WHERE key_column=expr; SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;
ref_or_null
ref
, but with the addition that we will do an extra search for
rows with NULL
. See section 5.2.5 How MySQL Optimises IS NULL
.
SELECT * FROM ref_table WHERE key_column=expr OR key_column IS NULL;This optimization is new for MySQL 4.1.1 and is mostly used when resolving sub queries.
range
key
column indicates which index is used.
The key_len
contains the longest key part that was used.
The ref
column will be NULL
for this type.
range
can be used for when an key column is compared to a
constant with =
, <>
, >
, >=
, <
,
<=
, IS NULL
, <=>
, BETWEEN
and IN
.
SELECT * FROM range_table WHERE key_column = 10; SELECT * FROM range_table WHERE key_column BETWEEN 10 and 20; SELECT * FROM range_table WHERE key_column IN (10,20,30); SELECT * FROM range_table WHERE key_part1= 10 and key_part2 IN (10,20,30);
index
ALL
, except that only the index tree is
scanned. This is usually faster than ALL
, as the index file is usually
smaller than the datafile.
This can be used when the query only uses columns that are part of one index.
ALL
const
, and usually very bad in all other
cases. You normally can avoid ALL
by adding more indexes, so that
the row can be retrieved based on constant values or column values from
earlier tables.
possible_keys
possible_keys
column indicates which indexes MySQL
could use to find the rows in this table. Note that this column is
totally independent of the order of the tables. That means that some of
the keys in possible_keys
may not be usable in practice with the
generated table order.
If this column is empty, there are no relevant indexes. In this case,
you may be able to improve the performance of your query by examining
the WHERE
clause to see if it refers to some column or columns
that would be suitable for indexing. If so, create an appropriate index
and check the query with EXPLAIN
again. See section 6.5.4 ALTER TABLE
Syntax.
To see what indexes a table has, use SHOW INDEX FROM tbl_name
.
key
key
column indicates the key (index) that MySQL actually
decided to use. The key is NULL
if no index was chosen. To force
MySQL to use an key listed in the possible_keys
column, use
USE KEY/IGNORE KEY
in your query.
See section 6.4.1 SELECT
Syntax.
Also, running myisamchk --analyze
(see section 4.4.6.1 myisamchk
Invocation Syntax) or ANALYZE TABLE
(see section 4.5.2 ANALYZE TABLE
Syntax) on the table will help the
optimiser choose better indexes.
key_len
key_len
column indicates the length of the key that
MySQL decided to use. The length is NULL
if the
key
is NULL
. Note that this tells us how many parts of a
multi-part key MySQL will actually use.
ref
ref
column shows which columns or constants are used with the
key
to select rows from the table.
rows
rows
column indicates the number of rows MySQL
believes it must examine to execute the query.
Extra
Distinct
Not exists
LEFT JOIN
optimisation on the
query and will not examine more rows in this table for the previous row
combination after it finds one row that matches the LEFT JOIN
criteria.
Here is an example for this:
SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;Assume that
t2.id
is defined with NOT NULL
. In this case
MySQL will scan t1
and look up the rows in t2
through t1.id
. If MySQL finds a matching row in
t2
, it knows that t2.id
can never be NULL
, and will
not scan through the rest of the rows in t2
that has the same
id
. In other words, for each row in t1
, MySQL
only needs to do a single lookup in t2
, independent of how many
matching rows there are in t2
.
range checked for each record (index map: #)
Using filesort
join type
and storing the sort key + pointer to
the row for all rows that match the WHERE
. Then the keys are
sorted. Finally the rows are retrieved in sorted order.
Using index
Using temporary
ORDER BY
on a different column set than you did a GROUP
BY
on.
Using where
WHERE
clause will be used to restrict which rows will be
matched against the next table or sent to the client. If you don't have
this information and the table is of type ALL
or index
,
you may have something wrong in your query (if you don't intend to
fetch/examine all rows from the table).
Using filesort
and Using temporary
.
You can get a good indication of how good a join is by multiplying all values
in the rows
column of the EXPLAIN
output. This should tell you
roughly how many rows MySQL must examine to execute the query. This
number is also used when you restrict queries with the max_join_size
variable.
See section 5.5.2 Tuning Server Parameters.
The following example shows how a JOIN
can be optimised progressively
using the information provided by EXPLAIN
.
Suppose you have the SELECT
statement shown here, that you examine
using EXPLAIN
:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL AND tt.ActualPC = et.EMPLOYID AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = do.CUSTNMBR;
For this example, assume that:
Table | Column | Column type |
tt | ActualPC | CHAR(10)
|
tt | AssignedPC | CHAR(10)
|
tt | ClientID | CHAR(10)
|
et | EMPLOYID | CHAR(15)
|
do | CUSTNMBR | CHAR(15)
|
Table | Index |
tt | ActualPC
|
tt | AssignedPC
|
tt | ClientID
|
et | EMPLOYID (primary key)
|
do | CUSTNMBR (primary key)
|
tt.ActualPC
values aren't evenly distributed.
Initially, before any optimisations have been performed, the EXPLAIN
statement produces the following information:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 do ALL PRIMARY NULL NULL NULL 2135 et_1 ALL PRIMARY NULL NULL NULL 74 tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 range checked for each record (key map: 35)
Because type
is ALL
for each table, this output indicates that
MySQL is doing a full join for all tables! This will take quite a
long time, as the product of the number of rows in each table must be
examined! For the case at hand, this is 74 * 2135 * 74 * 3872 =
45,268,558,720
rows. If the tables were bigger, you can only imagine how
long it would take.
One problem here is that MySQL can't (yet) use indexes on columns
efficiently if they are declared differently. In this context,
VARCHAR
and CHAR
are the same unless they are declared as
different lengths. Because tt.ActualPC
is declared as CHAR(10)
and et.EMPLOYID
is declared as CHAR(15)
, there is a length
mismatch.
To fix this disparity between column lengths, use ALTER TABLE
to
lengthen ActualPC
from 10 characters to 15 characters:
mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);
Now tt.ActualPC
and et.EMPLOYID
are both VARCHAR(15)
.
Executing the EXPLAIN
statement again produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 Using where do ALL PRIMARY NULL NULL NULL 2135 range checked for each record (key map: 1) et_1 ALL PRIMARY NULL NULL NULL 74 range checked for each record (key map: 1) et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
This is not perfect, but is much better (the product of the rows
values is now less by a factor of 74). This version is executed in a couple
of seconds.
A second alteration can be made to eliminate the column length mismatches
for the tt.AssignedPC = et_1.EMPLOYID
and tt.ClientID =
do.CUSTNMBR
comparisons:
mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID VARCHAR(15);
Now EXPLAIN
produces the output shown here:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using where ClientID, ActualPC et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
This is almost as good as it can get.
The remaining problem is that, by default, MySQL assumes that values
in the tt.ActualPC
column are evenly distributed, and that isn't the
case for the tt
table. Fortunately, it is easy to tell MySQL
about this:
shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt shell> mysqladmin refresh
Now the join is perfect, and EXPLAIN
produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC NULL NULL NULL 3872 Using where ClientID, ActualPC et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
Note that the rows
column in the output from EXPLAIN
is an
educated guess from the MySQL join optimiser. To optimise a
query, you should check if the numbers are even close to the truth. If not,
you may get better performance by using STRAIGHT_JOIN
in your
SELECT
statement and trying to list the tables in a different order in
the FROM
clause.
Posted by John Manko on Tuesday July 8 2003, @5:26pm | [Delete] [Edit] |
I like the output that the EXPLAIN command produces; however, there doesn't seem to be functionality to print that to a file. I request the following feature:
EXPLAIN tbl_name [INTO file_name]
where, if the INTO clause is used, the command will output to file_name instead of stdout;
Posted by Partap Davis on Monday July 14 2003, @11:32am | [Delete] [Edit] |
You can do that from the command line easily enough:
shell> mysql my_database -e " explain select ..." > output_file
Posted by Andres on Monday July 14 2003, @2:45pm | [Delete] [Edit] |
I think you should say that if the query returns no data, the explain won't work, and will say 'Impossible WHERE noticed after reading const tables'.