Lesson 7 of the SQL Optimization Course: SQL Statements Related to Character Sets in MySQL | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

πŸ§‘β€πŸ’» Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


This lesson focuses on the usage of SQL statements related to character sets.

1. Character Introducer

That is, for a character string, an explicit character encoding and collation rule are given, which are not affected by system parameters.

Syntax:

1
[_charset_name] 'string' [COLLATE collation_name]

Summary of Introducer Usage Rules:

Character SetCollation RuleCharacter - String Confirmation Rule
SpecifiedSpecifiedConfirm according to the specification
SpecifiedAbsentConfirm according to the default collation rule corresponding to the specified character set
AbsentSpecifiedConfirm according to the value of @@character_set_connection; an error will be reported if the collation rule does not match the character set
AbsentAbsentConfirm according to the value of @@character_set_connection

2. Character - Set Conversion Functions

2.1 convert Function

The convert function is similar to the introducer, but it can only specify the character set.

For example, use the convert function to convert the encoding of the character string “Go, Beijing!” to utf8mb4. However, the prerequisite is that the character sets before and after the conversion must be compatible.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
-- Correct conversion

mysql> select convert("δΈ­ζ–‡θ‹±ζ–‡!" using utf8mb4);
+-------------------------------------------+
| convert("δΈ­ζ–‡θ‹±ζ–‡!" using utf8mb4)    |
+-------------------------------------------+
| δΈ­ζ–‡θ‹±ζ–‡!                             |
+-------------------------------------------+
1 row in set (0.00 sec)

-- Incorrect conversion, character - set encoding is not compatible.

mysql> select convert("δΈ­ζ–‡θ‹±ζ–‡!" using latin1);
+------------------------------------------+
| convert("δΈ­ζ–‡θ‹±ζ–‡!" using latin1)    |
+------------------------------------------+
| ?????!                                  |
+------------------------------------------+
1 row in set (0.00 sec)

2.2 charset Function

It is used to detect the character set of a string. It can detect the character set of the current string in the current session.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mysql> set @a="δΈ­ζ–‡θ‹±ζ–‡!";
Query OK, 0 rows affected (0.00 sec)

mysql> select charset(@a);
+-------------+
| charset(@a) |
+-------------+
| utf8        |
+-------------+
1 row in set (0.00 sec)

2.3 set names Statement

Syntax:

1
2
SET NAMES {'charset_name'
[COLLATE 'collation_name'] | DEFAULT}
This statement is the most commonly used, but it is also the most easily misused. For example, the statement:
1
set names latin1 collate latin1_bin;
After execution, a series of statements will be executed by default, which means that the relevant non - server parameters will be reset.
1
2
3
4
set session character_set_results = latin1;
set session character_set_client = latin1;
set session character_set_connection = latin1;
set session collation_connection = latin1_bin;

Now, let’s execute it again for confirmation. Similar to the introducer, if the collate statement is not specified, the default collation rule corresponding to the character set will be used.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

-- Here we can see that all relevant parameter values have been changed.

mysql> select * from performance_schema.session_variables where variable_name in ('character_set_connection', 'collation_connection', 'character_set_results', 'character_set_client');
+--------------------------+-------------------+
| VARIABLE_NAME            | VARIABLE_VALUE    |
+--------------------------+-------------------+
| character_set_client     | latin1            |
| character_set_connection | latin1            |
| character_set_results    | latin1            |
| collation_connection     | latin1_swedish_ci |
+--------------------------+-------------------+

If you want to change it back to the default value, simply execute:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
mysql> set names default;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from performance_schema.session_variables where variable_name in ('character_set_connection', 'collation_connection', 'character_set_results', 'character_set_client');
+--------------------------+--------------------+
| VARIABLE_NAME            | VARIABLE_VALUE     |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_results    | utf8mb4            |
| collation_connection     | utf8mb4_0900_ai_ci |
+--------------------------+--------------------+
4 rows in set (0.00 sec)

It should be noted that not all character sets are applicable to this statement. For example, if you set the fixed - length character set utf32, an error will be reported because the variable @@character_set_client does not support this character set.

1
2
mysql> set names utf32;
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of 'utf32'

2.4 set character set Statement

Syntax:

1
2
SET {CHARACTER SET | CHARSET}
{'charset_name' | DEFAULT}
Similar to the set names statement, it also sets the following three session parameters:

  • character_set_results
  • character_set_client
  • character_set_connection

It can also restore the default values and has the same restriction rules.

However, there are two differences:

  1. The value of the parameter character_set_connection will not be set to the specified character set. Instead, it will inherit the character set set by the parameter character_set_database.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
mysql> set character set latin1;
Query OK, 0 rows affected (0.01 sec)

-- The query results show that the value of the parameter character_set_connection is the same as that of character_set_database.
mysql> select * from performance_schema.session_variables where variable_name in ('character_set_connection', 'collation_connection', 'character_set_database', 'character_set_results', 'character_set_client');
+--------------------------+--------------------+
| VARIABLE_NAME            | VARIABLE_VALUE     |
+--------------------------+--------------------+
| character_set_client     | latin1             |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_results    | latin1             |
| collation_connection     | utf8mb4_0900_ai_ci |
+--------------------------+--------------------+
5 rows in set (0.00 sec)
2. It is only used to set the character set and cannot define the specific collation rule name. That is, the collation rule name is the default collation rule name corresponding to the character set. This can be seen from the above example.

2.5 collate Clause

The collate statement is used to force - specify the collation rule and has the highest priority. That is, explicitly specifying collate will override the existing collation rule.

This involves the collation rules for single character strings and character - string concatenation.

Explicitly specify the sorting method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- Example table `c1`
mysql> create table c1 (n char(1));
Query OK, 0 rows affected (0.06 sec)

-- Insert sample data, inserting English uppercase and lowercase letters in a random order
mysql> insert into c1 with recursive a(x,y) as
(
   select 65,97 union all select x+1,y+1 from a where x<90
)
select char(x using ascii) x from a
union all
select char(y using ascii) y from a order by rand();

Query OK, 52 rows affected (0.02 sec)
Records: 52  Duplicates: 0  Warnings: 0

Change the collation rule of order by.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Original sorting result
mysql> select n from c1 order by n  desc limit 6;
+------+
| n    |
+------+
| Z    |
| z    |
| y    |
| Y    |
| x    |
| X    |
+------+
6 rows in set (0.00 sec)

-- Sorting result after explicitly specifying `collate`

mysql> select n from c1 order by n  collate utf8mb4_0900_bin desc limit 6;
+------+
| n    |
+------+
| z    |
| y    |
| x    |
| w    |
| v    |
| u    |
+------+
6 rows in set (0.00 sec)

Used for specific column aliases.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql> select n collate utf8mb4_bin as n from c1 order by n  desc limit 6;
+------+
| n    |
+------+
| z    |
| y    |
| x    |
| w    |
| v    |
| u    |
+------+
6 rows in set (0.01 sec)

Used for aggregate functions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mysql> select max(n) n from c1;
+------+
| n    |
+------+
| Z    |
+------+
1 row in set (0.00 sec)

-- Forced collate result
mysql> select max(n collate utf8mb4_bin) n from c1;
+------+
| n    |
+------+
| z    |
+------+
1 row in set (0.00 sec)
It can also be used in where, group by, having, etc.

Used for the mandatory indicators of the sorting method

For example, when filtering a character string in the where condition, such as where a = 'a', the question then arises: should the collation rule of a be used, or the collation rule of the character 'a'? Here, it involves a mandatory indicator of the collation rule.

The values of the mandatory indicators of the collation rules in MySQL range from 0 to 6, a total of 7 levels. The smaller the numerical indicator, the higher the priority. The following is an explanation of the indicator values:

ValueExplanation
0Explicitly specify the collate clause, with the highest priority
1When concatenating two character strings with different collation rules, the indicator value is 1
2The indicator value for two different columns, stored - procedure parameters, or local variables is 2
3The indicator value for system constants is 3. For example, functions like user(), version() etc.
4For simple character - string literals, the indicator value is 4
5The indicator value for numbers, dates, etc. is 5
6The indicator values for NULL, expressions containing NULL, or results that are NULL are all 6

At the same time, to facilitate understanding, MySQL provides the coercibility function to detect this indicator.

Here are some examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- The `collate` clause indicator is 0
mysql> select coercibility('a' collate utf8mb4_bin) as 'coercibility';
+--------------+
| coercibility |
+--------------+
|            0 |
+--------------+
1 row in set (0.00 sec)

-- The indicator for a simple text is 4

mysql> select coercibility('a') as 'coercibility';
+--------------+
| coercibility |
+--------------+
|            4 |
+--------------+
1 row in set (0.00 sec)

-- The indicator for a date is 5

mysql> select coercibility(now()) as 'coercibility';
+--------------+
| coercibility |
+--------------+
|            5 |
+--------------+
1 row in set (0.00 sec)

-- The indicator for a system constant is 3

mysql> select coercibility(@@server_uuid) as 'coercibility';
+--------------+
| coercibility |
+--------------+
|            3 |
+--------------+
1 row in set (0.00 sec)

Summary

That’s almost all for the introduction of SQL related to character - set control. We mainly introduced, with examples, SQL statements for handling character sets in MySQL. For example, statements for setting client - related encodings: SET NAMES/SET CHARSET; statements for setting collation rules: COLLATE; and the introducer that sends signals to MySQL, etc. Hope it is helpful to everyone.

πŸ‘‹ See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.