The metadata is the data about the data. Anything that
describes the database, as opposed to being the contents of the
database, is metadata. Thus column names, database names, user
names, version names, and most of the string results from SHOW
, are
metadata.
All metadata must be in the same character set. (Otherwise, SHOW
wouldn't work properly because different rows in the same column
would be in different character sets.) On the other hand, metadata
must include all characters in all languages. (Otherwise, users
wouldn't be able to name columns and tables in their own
languages.) In order to allow for both of these objectives, MySQL
stores metadata in a Unicode character set, namely UTF8. This will
not cause any disruption if you never use accented characters. But
if you do, you should be aware that metadata is in UTF8.
This means that USER()
, CURRENT_USER
, and VERSION()
functions will have the UTF8 character set by default.
This does NOT mean that the headers of columns and the results
of DESCRIBE
functions will be in the UTF8 character set by default.
(When you say SELECT column1 FROM t
the name column1
itself will
be returned from the server to the client in the client's character
set as determined by the SET NAMES
statement.)
If you want the server to pass metadata results back in a
non-UTF8 character set, then use SET CHARACTER SET
to force the
server to convert (see section 8.3.6 Connection Character Sets and Collations),
or set the client to do the conversion. It is
always more efficient to set the client to do the conversion, but
this option will not be available for many clients until late in
the MySQL 4.x product cycle.
If you are just using, for example, the USER()
function for
comparison or assignment within a single statement ... don't worry.
MySQL will do some automatic conversion for you.
SELECT * FROM Table1 WHERE USER() = latin1_column;
This will work, because the contents of latin1_column
are
automatically converted to UTF8 before the comparison.
INSERT INTO Table1 (latin1_column) SELECT USER();
This will work, becaues the contents of USER()
are automatically
converted to latin1
before the assignment.
Automatic conversion is not fully implemented yet, but should work
correctly in a later version.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a ``subset'' of Unicode. Since it is a well-known principle that ``what applies to a superset can apply to a subset,'' we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.
VERSION 4.1.1 NOTE: The `errmsg.txt' files will all be in UTF8 after this point. Conversion to the client character set will be automatic, as for metadata. Also: We may change the default behaviour for passing back result set metadata in the near future.