5.2.13 Other Optimisation Tips
Unsorted tips for faster systems:
-
Use persistent connections to the database to avoid the connection
overhead. If you can't use persistent connections and you are doing a
lot of new connections to the database, you may want to change the value
of the
thread_cache_size
variable. See section 5.5.2 Tuning Server Parameters.
-
Always check that all your queries really use the indexes you have created
in the tables. In MySQL you can do this with the
EXPLAIN
command. See section 5.2.1 EXPLAIN
Syntax (Get Information About a SELECT
).
-
Try to avoid complex
SELECT
queries on MyISAM
tables that are
updated a lot. This is to avoid problems with table locking.
-
The new
MyISAM
tables can insert rows in a table without deleted
rows at the same time another table is reading from it. If this is important
for you, you should consider methods where you don't have to delete rows
or run OPTIMIZE TABLE
after you have deleted a lot of rows.
-
Use
ALTER TABLE ... ORDER BY expr1,expr2...
if you mostly
retrieve rows in expr1,expr2...
order. By using this option after big
changes to the table, you may be able to get higher performance.
-
In some cases it may make sense to introduce a column that is 'hashed'
based on information from other columns. If this column is short and
reasonably unique it may be much faster than a big index on many
columns. In MySQL it's very easy to use this extra column:
SELECT * FROM table_name WHERE hash=MD5(CONCAT(col1,col2))
AND col_1='constant' AND col_2='constant'
-
For tables that change a lot you should try to avoid all
VARCHAR
or BLOB
columns. You will get dynamic row length as soon as you
are using a single VARCHAR
or BLOB
column. See section 7 MySQL Table Types.
-
It's not normally useful to split a table into different tables just
because the rows gets 'big'. To access a row, the biggest performance
hit is the disk seek to find the first byte of the row. After finding
the data most new disks can read the whole row fast enough for most
applications. The only cases where it really matters to split up a table is if
it's a dynamic row size table (see above) that you can change to a fixed
row size, or if you very often need to scan the table and don't need
most of the columns. See section 7 MySQL Table Types.
-
If you very often need to calculate things based on information from a
lot of rows (like counts of things), it's probably much better to
introduce a new table and update the counter in real time. An update of
type
UPDATE table SET count=count+1 WHERE index_column=constant
is very fast!
This is really important when you use MySQL table types like MyISAM and
ISAM that
only have table locking (multiple readers / single writers). This will
also give better performance with most databases, as the row locking
manager in this case will have less to do.
-
If you need to collect statistics from big log tables, use summary tables
instead of scanning the whole table. Maintaining the summaries should be
much faster than trying to do statistics 'live'. It's much faster to
regenerate new summary tables from the logs when things change
(depending on business decisions) than to have to change the running
application!
-
If possible, one should classify reports as 'live' or 'statistical',
where data needed for statistical reports are only generated based on
summary tables that are generated from the actual data.
-
Take advantage of the fact that columns have default values. Insert
values explicitly only when the value to be inserted differs from the
default. This reduces the parsing that MySQL need to do and
improves the insert speed.
-
In some cases it's convenient to pack and store data into a blob. In this
case you have to add some extra code in your application to pack/unpack
things in the blob, but this may save a lot of accesses at some stage.
This is practical when you have data that doesn't conform to a static
table structure.
-
Normally you should try to keep all data non-redundant (what
is called 3rd normal form in database theory), but you should not be
afraid of duplicating things or creating summary tables if you need these
to gain more speed.
-
Stored procedures or UDF (user-defined functions) may be a good way to
get more performance. In this case you should, however, always have a way
to do this some other (slower) way if you use some database that doesn't
support this.
-
You can always gain something by caching queries/answers in your
application and trying to do many inserts/updates at the same time. If
your database supports lock tables (like MySQL and Oracle),
this should help to ensure that the index cache is only flushed once
after all updates.
-
Use
INSERT /*! DELAYED */
when you do not need to know when your
data is written. This speeds things up because many records can be written
with a single disk write.
-
Use
INSERT /*! LOW_PRIORITY */
when you want your selects to be
more important.
-
Use
SELECT /*! HIGH_PRIORITY */
to get selects that jump the
queue. That is, the select is done even if there is somebody waiting to
do a write.
-
Use the multi-line
INSERT
statement to store many rows with one
SQL command (many SQL servers supports this).
-
Use
LOAD DATA INFILE
to load bigger amounts of data. This is
faster than normal inserts and will be even faster when myisamchk
is integrated in mysqld
.
-
Use
AUTO_INCREMENT
columns to make unique values.
-
Use
OPTIMIZE TABLE
once in a while to avoid fragmentation when
using a dynamic table format. See section 4.5.1 OPTIMIZE TABLE
Syntax.
-
Use
HEAP
tables to get more speed when possible. See section 7 MySQL Table Types.
-
When using a normal web server setup, images should be stored as
files. That is, store only a file reference in the database. The main
reason for this is that a normal web server is much better at caching
files than database contents. So it it's much easier to get a fast
system if you are using files.
-
Use in memory tables for non-critical data that are accessed often (like
information about the last shown banner for users that don't have
cookies).
-
Columns with identical information in different tables should be
declared identical and have identical names. Before Version 3.23 you
got slow joins otherwise.
Try to keep the names simple (use
name
instead of
customer_name
in the customer table). To make your names portable
to other SQL servers you should keep them shorter than 18 characters.
-
If you need really high speed, you should take a look at the low-level
interfaces for data storage that the different SQL servers support! For
example, by accessing the MySQL
MyISAM
directly, you could
get a speed increase of 2-5 times compared to using the SQL interface.
To be able to do this the data must be on the same server as
the application, and usually it should only be accessed by one process
(because external file locking is really slow). One could eliminate the
above problems by introducing low-level MyISAM
commands in the
MySQL server (this could be one easy way to get more
performance if needed). By carefully designing the database interface,
it should be quite easy to support this types of optimisation.
-
In many cases it's faster to access data from a database (using a live
connection) than accessing a text file, just because the database is
likely to be more compact than the text file (if you are using numerical
data), and this will involve fewer disk accesses. You will also save
code because you don't have to parse your text files to find line and
column boundaries.
-
You can also use replication to speed things up. See section 4.10 Replication in MySQL.
-
Declaring a table with
DELAY_KEY_WRITE=1
will make the updating of
indexes faster, as these are not logged to disk until the file is closed.
The downside is that you should run myisamchk
on these tables before
you start mysqld
to ensure that they are okay if something killed
mysqld
in the middle. As the key information can always be generated
from the data, you should not lose anything by using DELAY_KEY_WRITE
.
Add your own comment.