Lesson 8 of the SQL Optimization Course: Best Practices for Managing MySQL Character Sets | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


1. Databases and Character Sets

1.1 Specifying Character Sets During Database Creation

When creating a database, explicitly specify the character set and collation. This ensures that the character_set_database and collation_database parameters are set to the specified values. For example, when creating the database ytt_new2 with the character set latin1 and collation latin1_bin, the corresponding system parameters are updated upon switching to this database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
mysql> create database ytt_new2 default character set latin1 collate latin1_bin;
Query OK, 1 row affected (0.03 sec)

mysql> use ytt_new2
Database changed

mysql> select @@character_set_database,  @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| latin1                   | latin1_bin           |
+--------------------------+----------------------+
1 row in set (0.00 sec)

1.2 Modifying Character Sets During Database Alteration

Modifying the character set and collation of a database is similar to specifying them during creation. However, it’s important to note that existing database objects (such as tables, views, and stored procedures) will retain their original character set and collation settings.

For example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
DELIMITER $$

USE `ytt_new2`$$

DROP PROCEDURE IF EXISTS `sp_demo`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_demo`(
    IN f1 VARCHAR(10),
    IN f2 VARCHAR(10)
    )
BEGIN
    DECLARE v1 VARCHAR(20);
    SET v1 = CONCAT(f1,f2);
    SELECT v1 AS result;
    END$$

DELIMITER;

When you check the character set of this stored procedure, you’ll see that the collation is latin1_bin and the corresponding character set is latin1, consistent with the database ytt_new2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
mysql> show create procedure sp_demo\G
*************************** 1. row ***************************
           Procedure: sp_demo
...
  Database Collation: latin1_bin
1 row in set (0.00 sec)

-- Then, change the database character set to UTF8
mysql> alter database ytt_new2 character set utf8 collate utf8_general_ci;
Query OK, 1 row affected, 2 warning (0.02 sec)

mysql> select @@character_set_database,  @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.00 sec)

-- Check the character set of the stored procedure sp_demo again, and it remains the same.
mysql> show create procedure sp_demo\G
*************************** 1. row ***************************
           Procedure: sp_demo
...
  Database Collation: latin1_bin
1 row in set (0.00 sec)

-- At this point, calling the stored procedure results in a character set mismatch error.

mysql> call sp_demo('A','I');
ERROR 1366 (HY000): Incorrect string value: '\xE6\x88\x91' for column 'f1' at row 1

-- To change the character set of the stored procedure, you need to drop and recreate it. After re-executing the stored procedure code, the call works correctly.

mysql> call sp_demo('A','I');
+--------+
| result |
+--------+
| AI     |
+--------+
1 row in set (0.00 sec)

Query OK, 0 rows affected, 2 warnings (0.01 sec)

1.3 Default Inheritance of Parameters

If the character_set_database and collation_database parameters are not explicitly specified, they will default to the server-level settings of character_set_server and collation_server, respectively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mysql> select @@character_set_server charset,  @@collation_server collation
    -> union all
    -> select @@character_set_database,  @@collation_database;
+---------+--------------------+
| charset | collation          |
+---------+--------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
| utf8mb4 | utf8mb4_0900_ai_ci |
+---------+--------------------+
2 rows in set (0.00 sec)

In this case, when creating or altering a database without specifying a specific character set and collation, these parameters will inherit the server-level settings of character_set_server and collation_server by default.

1
2
3
4
5
mysql> show create database ytt_new3\G
*************************** 1. row ***************************
       Database: ytt_new3
Create Database: CREATE DATABASE `ytt_new3` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */
1 row in set (0.00 sec)

2. Tables and Character Sets

2.1 Specifying Character Sets During Table Creation (Explicit Setting)

Similar to creating a database, explicitly specifying the character set and collation when creating a table has the highest priority and will be used as the definitive settings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
-- create new database ytt_new4
mysql> create database ytt_new4;
Query OK, 1 row affected (0.02 sec)

mysql> use ytt_new4;
Database changed

-- create new table t1
mysql> create table t1(a1 int) charset latin1 collate latin1_bin;
Query OK, 0 rows affected (0.05 sec)

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a1` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin
1 row in set (0.00 sec)

2.2 Inheriting Settings (Implicit Conversion)

By default, tables inherit the character set and collation from the database they belong to, not the current database. It is important to note that this is based on the database where the table resides, not the database currently in use.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mysql> use ytt_new4;
Database changed

mysql> create table t2(a1 int);
Query OK, 0 rows affected (0.05 sec)

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `a1` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> create database ytt_new5 character set gbk;
Query OK, 1 row affected (0.03 sec)

mysql> create table ytt_new5.t3 (id int);
Query OK, 0 rows affected (0.04 sec)

mysql> show create table ytt_new5.t3\G
*************************** 1. row ***************************
       Table: t3
Create Table: CREATE TABLE `t3` (
  `id` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=gbk
1 row in set (0.01 sec)

2.3 Views

A view is essentially a virtual table, and its character set is determined by the client’s character set settings at the time of creation. This means that the character set of a view is entirely dependent on the client connection settings when the view is defined.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mysql> set names gbk;
Query OK, 0 rows affected (0.00 sec)

mysql> create view v_t3 as select * from t3;
Query OK, 0 rows affected (0.01 sec)

mysql> set names gb18030;
Query OK, 0 rows affected (0.00 sec)

mysql> create view v_t3_1 as select * from t3;
Query OK, 0 rows affected (0.01 sec)

mysql> show create view v_t3\G
*************************** 1. row ***************************
                View: v_t3
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v_t3` AS select `t3`.`a1` AS `a1` from `t3`
character_set_client: gbk
collation_connection: gbk_chinese_ci
1 row in set (0.01 sec)

mysql> show create view v_t3_1\G
*************************** 1. row ***************************
                View: v_t3_1
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v_t3_1` AS select `t3`.`a1` AS `a1` from `t3`
character_set_client: gb18030
collation_connection: gb18030_chinese_ci
1 row in set (0.00 sec)

3.4 Triggers

Triggers are similar to views in that their character set is also determined by the client settings at the time of creation. This means that the character set of a trigger will match the client’s character set when it is defined.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create trigger tr_after_insert_t3 after insert on t3 for each row  insert into t4 values(new.a1);
Query OK, 0 rows affected (0.01 sec)

mysql> show create trigger tr_after_insert_t3\G
*************************** 1. row ***************************
               Trigger: tr_after_insert_t3
              sql_mode:
SQL Original Statement: CREATE DEFINER=`root`@`localhost` TRIGGER `tr_after_insert_t3` AFTER INSERT ON `t3` FOR EACH ROW insert into t4 values(new.a1)
  character_set_client: utf8
  collation_connection: utf8_general_ci
    Database Collation: utf8mb4_0900_ai_ci
               Created: 2020-03-09 11:32:23.94
1 row in set (0.00 sec)

3. Columns and Character Sets

3.1 Explicit Specification

Like databases and tables, columns can also explicitly specify a particular character set and collation. Although this is possible, it is highly discouraged. The main reason is that having different character sets for each column can complicate data insertion and retrieval, requiring additional SQL statements.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
mysql> create database ytt_new6 character set latin1;
Query OK, 1 row affected (0.02 sec)

mysql> use ytt_new6;
Database changed

mysql> create table t1( a1 char(10) charset latin1 collate latin1_bin,
                       a2 char(10) charset gbk collate gbk_bin,
                       a3 char(10) charset utf8mb4 collate utf8mb4_bin);
Query OK, 0 rows affected (0.05 sec)

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a1` char(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `a2` char(10) CHARACTER SET gbk COLLATE gbk_bin DEFAULT NULL,
  `a3` char(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

To insert a simple record next, you need to add an introducer for each column; otherwise, an error will occur.

The most troublesome part is querying, as the result of each query can only have one character set. Columns with compatible character sets can be retrieved together, while those with incompatible character sets must be retrieved separately. For example, I wanted to simply use SELECT * to retrieve all records, but found that one of the fields displayed incorrectly.

1
2
3
4
5
6
7
mysql> select * from t1;
+-----------+------------------+--------------+
| a1        | a2               | a3           |
+-----------+------------------+--------------+
| character |  (U+FFFD)      | collation     |
+-----------+------------------+--------------+
1 row in set (0.00 sec)

So, how should data be retrieved in such cases? It is necessary to retrieve individual fields or retrieve compatible columns together.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
mysql> set names gbk;
Query OK, 0 rows affected (0.00 sec)

mysql> select a1,a2 from t1;
+-----------+--------------+
| a1        | a2           |
+-----------+--------------+
| character | collation    |
+-----------+--------------+
1 row in set (0.00 sec)

mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> select a3 from t1;
+--------------+
| a3           |
+--------------+
| collation    |
+--------------+
1 row in set (0.00 sec)

3.2 Implicit Conversion

This approach is the most recommended and the most common, where all columns inherit the table’s character set without explicit specification.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
mysql> create table t2(a1 varchar(10),a2 varchar(10)) charset gbk;
Query OK, 0 rows affected (0.05 sec)

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `a1` varchar(10) DEFAULT NULL,
  `a2` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=gbk
1 row in set (0.01 sec)

However, there is an important point to note: if you change the character set of the table at this time, the columns of the table will still retain their original character set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql> alter table t2 charset utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `a1` varchar(10) CHARACTER SET gbk DEFAULT NULL,
  `a2` varchar(10) CHARACTER SET gbk DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

If you want to change the character set of the fields in the table, you should use the following statement to change the table’s character set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql> alter table t2 convert to character set utf8;
Query OK, 0 rows affected, 1 warning (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `a1` varchar(10) DEFAULT NULL,
  `a2` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

In summary, this article has detailed the considerations for handling character sets in MySQL databases, tables, and columns, with examples provided. I hope this information is helpful.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.