SELECTs and Other Queries
EXPLAIN Syntax (Get Information About a SELECT)
SELECT Queries
WHERE Clauses
IS NULL
DISTINCT
LEFT JOIN and RIGHT JOIN
ORDER BY
LIMIT
INSERT Queries
UPDATE Queries
DELETE Queries
EXPLAIN Syntax (Get Information About a SELECT)
EXPLAIN tbl_name
or EXPLAIN SELECT select_options
EXPLAIN tbl_name is a synonym for DESCRIBE tbl_name or
SHOW COLUMNS FROM tbl_name.
When you precede a SELECT statement with the keyword EXPLAIN,
MySQL explains how it would process the SELECT, providing
information about how tables are joined and in which order.
With the help of EXPLAIN, you can see when you must add indexes
to tables to get a faster SELECT that uses indexes to find the
records.
You should frequently run ANALYZE TABLE to update table statistics
such as cardinality of keys which can affect the choices the optimiser
makes. See section 4.5.2 ANALYZE TABLE Syntax.
You can also see if the optimiser joins the tables in an optimal
order. To force the optimiser to use a specific join order for a
SELECT statement, add a STRAIGHT_JOIN clause.
For non-simple joins, EXPLAIN returns a row of information for each
table used in the SELECT statement. The tables are listed in the order
they would be read. MySQL resolves all joins using a single-sweep
multi-join method. This means that MySQL reads a row from the first
table, then finds a matching row in the second table, then in the third table
and so on. When all tables are processed, it outputs the selected columns and
backtracks through the table list until a table is found for which there are
more matching rows. The next row is read from this table and the process
continues with the next table.
In MySQL version 4.1 the EXPLAIN output was changed to work better
with constructs like UNIONs, subqueries and derived tables. Most
notable is the addition of two new columns: id and select_type.
Output from EXPLAIN consists of the following columns:
id
SELECT identifier, the sequential number of this SELECT
within the query.
select_type
SELECT clause, which can be any of the following:
SIMPLE
SELECT (without UNIONs or subqueries).
PRIMARY
SELECT.
UNION
UNION SELECTs.
DEPENDENT UNION
UNION SELECTSs, dependent on outer
subquery.
SUBQUERY
SELECT in subquery.
DEPENDENT SUBQUERY
SELECT, dependent on outer subquery.
DERIVED
SELECT.
table
type
system
const join type.
const
const tables are very fast as they are read only once!
const is used when you compare all parts of a
PRIMARY/UNIQUE key with constants:
SELECT * FROM const_table WHERE primary_key=1; SELECT * FROM const_table WHERE primary_key_part1=1 and primary_key_part2=2;
eq_ref
const types. It is used when all parts of an index are used by
the join and the index is UNIQUE or a PRIMARY KEY.
eq_ref can be used for indexed columns that is compared with
=. The compared item may be a constant or an expression that uses
columns from tables that are read before this table.
In the following examples, ref_table will be able to use eq_ref
SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;
ref
ref is used if the join
uses only a leftmost prefix of the key, or if the key is not UNIQUE
or a PRIMARY KEY (in other words, if the join cannot select a single
row based on the key value). If the key that is used matches only a few rows,
this join type is good.
ref can be used for indexed columns that is compared with =.
In the following examples, ref_table will be able to use ref
SELECT * FROM ref_table WHERE key_column=expr; SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;
ref_or_null
ref, but with the addition that we will do an extra search for
rows with NULL. See section 5.2.5 How MySQL Optimises IS NULL.
SELECT * FROM ref_table WHERE key_column=expr OR key_column IS NULL;This optimization is new for MySQL 4.1.1 and is mostly used when resolving sub queries.
range
key column indicates which index is used.
The key_len contains the longest key part that was used.
The ref column will be NULL for this type.
range can be used for when an key column is compared to a
constant with =, <>, >, >=, <,
<=, IS NULL, <=>, BETWEEN and IN.
SELECT * FROM range_table WHERE key_column = 10; SELECT * FROM range_table WHERE key_column BETWEEN 10 and 20; SELECT * FROM range_table WHERE key_column IN (10,20,30); SELECT * FROM range_table WHERE key_part1= 10 and key_part2 IN (10,20,30);
index
ALL, except that only the index tree is
scanned. This is usually faster than ALL, as the index file is usually
smaller than the datafile.
This can be used when the query only uses columns that are part of one index.
ALL
const, and usually very bad in all other
cases. You normally can avoid ALL by adding more indexes, so that
the row can be retrieved based on constant values or column values from
earlier tables.
possible_keys
possible_keys column indicates which indexes MySQL
could use to find the rows in this table. Note that this column is
totally independent of the order of the tables. That means that some of
the keys in possible_keys may not be usable in practice with the
generated table order.
If this column is empty, there are no relevant indexes. In this case,
you may be able to improve the performance of your query by examining
the WHERE clause to see if it refers to some column or columns
that would be suitable for indexing. If so, create an appropriate index
and check the query with EXPLAIN again. See section 6.5.4 ALTER TABLE Syntax.
To see what indexes a table has, use SHOW INDEX FROM tbl_name.
key
key column indicates the key (index) that MySQL actually
decided to use. The key is NULL if no index was chosen. To force
MySQL to use an key listed in the possible_keys column, use
USE KEY/IGNORE KEY in your query.
See section 6.4.1 SELECT Syntax.
Also, running myisamchk --analyze (see section 4.4.6.1 myisamchk Invocation Syntax) or ANALYZE TABLE
(see section 4.5.2 ANALYZE TABLE Syntax) on the table will help the
optimiser choose better indexes.
key_len
key_len column indicates the length of the key that
MySQL decided to use. The length is NULL if the
key is NULL. Note that this tells us how many parts of a
multi-part key MySQL will actually use.
ref
ref column shows which columns or constants are used with the
key to select rows from the table.
rows
rows column indicates the number of rows MySQL
believes it must examine to execute the query.
Extra
Distinct
Not exists
LEFT JOIN optimisation on the
query and will not examine more rows in this table for the previous row
combination after it finds one row that matches the LEFT JOIN criteria.
Here is an example for this:
SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;Assume that
t2.id is defined with NOT NULL. In this case
MySQL will scan t1 and look up the rows in t2
through t1.id. If MySQL finds a matching row in
t2, it knows that t2.id can never be NULL, and will
not scan through the rest of the rows in t2 that has the same
id. In other words, for each row in t1, MySQL
only needs to do a single lookup in t2, independent of how many
matching rows there are in t2.
range checked for each record (index map: #)
Using filesort
join type and storing the sort key + pointer to
the row for all rows that match the WHERE. Then the keys are
sorted. Finally the rows are retrieved in sorted order.
Using index
Using temporary
ORDER BY on a different column set than you did a GROUP
BY on.
Using where
WHERE clause will be used to restrict which rows will be
matched against the next table or sent to the client. If you don't have
this information and the table is of type ALL or index,
you may have something wrong in your query (if you don't intend to
fetch/examine all rows from the table).
Using filesort and Using temporary.
You can get a good indication of how good a join is by multiplying all values
in the rows column of the EXPLAIN output. This should tell you
roughly how many rows MySQL must examine to execute the query. This
number is also used when you restrict queries with the max_join_size
variable.
See section 5.5.2 Tuning Server Parameters.
The following example shows how a JOIN can be optimised progressively
using the information provided by EXPLAIN.
Suppose you have the SELECT statement shown here, that you examine
using EXPLAIN:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
tt.ProjectReference, tt.EstimatedShipDate,
tt.ActualShipDate, tt.ClientID,
tt.ServiceCodes, tt.RepetitiveID,
tt.CurrentProcess, tt.CurrentDPPerson,
tt.RecordVolume, tt.DPPrinted, et.COUNTRY,
et_1.COUNTRY, do.CUSTNAME
FROM tt, et, et AS et_1, do
WHERE tt.SubmitTime IS NULL
AND tt.ActualPC = et.EMPLOYID
AND tt.AssignedPC = et_1.EMPLOYID
AND tt.ClientID = do.CUSTNMBR;
For this example, assume that:
| Table | Column | Column type |
tt | ActualPC | CHAR(10)
|
tt | AssignedPC | CHAR(10)
|
tt | ClientID | CHAR(10)
|
et | EMPLOYID | CHAR(15)
|
do | CUSTNMBR | CHAR(15)
|
| Table | Index |
tt | ActualPC
|
tt | AssignedPC
|
tt | ClientID
|
et | EMPLOYID (primary key)
|
do | CUSTNMBR (primary key)
|
tt.ActualPC values aren't evenly distributed.
Initially, before any optimisations have been performed, the EXPLAIN
statement produces the following information:
table type possible_keys key key_len ref rows Extra
et ALL PRIMARY NULL NULL NULL 74
do ALL PRIMARY NULL NULL NULL 2135
et_1 ALL PRIMARY NULL NULL NULL 74
tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872
range checked for each record (key map: 35)
Because type is ALL for each table, this output indicates that
MySQL is doing a full join for all tables! This will take quite a
long time, as the product of the number of rows in each table must be
examined! For the case at hand, this is 74 * 2135 * 74 * 3872 =
45,268,558,720 rows. If the tables were bigger, you can only imagine how
long it would take.
One problem here is that MySQL can't (yet) use indexes on columns
efficiently if they are declared differently. In this context,
VARCHAR and CHAR are the same unless they are declared as
different lengths. Because tt.ActualPC is declared as CHAR(10)
and et.EMPLOYID is declared as CHAR(15), there is a length
mismatch.
To fix this disparity between column lengths, use ALTER TABLE to
lengthen ActualPC from 10 characters to 15 characters:
mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);
Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15).
Executing the EXPLAIN statement again produces this result:
table type possible_keys key key_len ref rows Extra
tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 Using where
do ALL PRIMARY NULL NULL NULL 2135
range checked for each record (key map: 1)
et_1 ALL PRIMARY NULL NULL NULL 74
range checked for each record (key map: 1)
et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
This is not perfect, but is much better (the product of the rows
values is now less by a factor of 74). This version is executed in a couple
of seconds.
A second alteration can be made to eliminate the column length mismatches
for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID =
do.CUSTNMBR comparisons:
mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15),
-> MODIFY ClientID VARCHAR(15);
Now EXPLAIN produces the output shown here:
table type possible_keys key key_len ref rows Extra
et ALL PRIMARY NULL NULL NULL 74
tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using where
ClientID,
ActualPC
et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1
do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
This is almost as good as it can get.
The remaining problem is that, by default, MySQL assumes that values
in the tt.ActualPC column are evenly distributed, and that isn't the
case for the tt table. Fortunately, it is easy to tell MySQL
about this:
shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt shell> mysqladmin refresh
Now the join is perfect, and EXPLAIN produces this result:
table type possible_keys key key_len ref rows Extra
tt ALL AssignedPC NULL NULL NULL 3872 Using where
ClientID,
ActualPC
et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1
do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
Note that the rows column in the output from EXPLAIN is an
educated guess from the MySQL join optimiser. To optimise a
query, you should check if the numbers are even close to the truth. If not,
you may get better performance by using STRAIGHT_JOIN in your
SELECT statement and trying to list the tables in a different order in
the FROM clause.
| Posted by John Manko on Tuesday July 8 2003, @5:26pm | [Delete] [Edit] |
I like the output that the EXPLAIN command produces; however, there doesn't seem to be functionality to print that to a file. I request the following feature:
EXPLAIN tbl_name [INTO file_name]
where, if the INTO clause is used, the command will output to file_name instead of stdout;
| Posted by Partap Davis on Monday July 14 2003, @11:32am | [Delete] [Edit] |
You can do that from the command line easily enough:
shell> mysql my_database -e " explain select ..." > output_file
| Posted by Andres on Monday July 14 2003, @2:45pm | [Delete] [Edit] |
I think you should say that if the query returns no data, the explain won't work, and will say 'Impossible WHERE noticed after reading const tables'.