Search the MySQL manual:

4.6.3 Adding a New Character Set

To add another character set to MySQL, use the following procedure.

Decide if the set is simple or complex. If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple. If it needs either of those features, it is complex.

For example, latin1 and danish are simple charactersets while big5 or czech are complex character sets.

In the following section, we have assumed that you name your character set MYSET.

For a simple character set do the following:

  1. Add MYSET to the end of the `sql/share/charsets/Index' file Assign a unique number to it.
  2. Create the file `sql/share/charsets/MYSET.conf'. (You can use `sql/share/charsets/latin1.conf' as a base for this.) The syntax for the file is very simple:
    • Comments start with a '#' character and proceed to the end of the line.
    • Words are separated by arbitrary amounts of whitespace.
    • When defining the character set, every word must be a number in hexadecimal format
    • The ctype array takes up the first 257 words. The to_lower[], to_upper[] and sort_order[] arrays take up 256 words each after that.
    See section 4.6.4 The Character Definition Arrays.
  3. Add the character set name to the CHARSETS_AVAILABLE and COMPILED_CHARSETS lists in configure.in.
  4. Reconfigure, recompile, and test.

For a complex character set do the following:

  1. Create the file `strings/ctype-MYSET.c' in the MySQL source distribution.
  2. Add MYSET to the end of the `sql/share/charsets/Index' file. Assign a unique number to it.
  3. Look at one of the existing `ctype-*.c' files to see what needs to be defined, for example `strings/ctype-big5.c'. Note that the arrays in your file must have names like ctype_MYSET, to_lower_MYSET, and so on. This corresponds to the arrays in the simple character set. See section 4.6.4 The Character Definition Arrays.
  4. Near the top of the file, place a special comment like this:
    /*
     * This comment is parsed by configure to create ctype.c,
     * so don't change it unless you know what you are doing.
     *
     * .configure. number_MYSET=MYNUMBER
     * .configure. strxfrm_multiply_MYSET=N
     * .configure. mbmaxlen_MYSET=N
     */
    
    The configure program uses this comment to include the character set into the MySQL library automatically. The strxfrm_multiply and mbmaxlen lines will be explained in the following sections. Only include these if you need the string collating functions or the multi-byte character set functions, respectively.
  5. You should then create some of the following functions:
    • my_strncoll_MYSET()
    • my_strcoll_MYSET()
    • my_strxfrm_MYSET()
    • my_like_range_MYSET()
    See section 4.6.5 String Collating Support.
  6. Add the character set name to the CHARSETS_AVAILABLE and COMPILED_CHARSETS lists in configure.in.
  7. Reconfigure, recompile, and test.

The file `sql/share/charsets/README' includes some more instructions.

If you want to have the character set included in the MySQL distribution, mail a patch to internals@lists.mysql.com.

User Comments

Posted by [name withheld] on Wednesday April 9 2003, @11:35am[Delete] [Edit]

you can to asign the value of language in my.cnf i.e.:

[mysqld]
..
language=spanish
..

Add your own comment.