For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.
This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.
π§βπ» Target audience:
DBA
Database developers
Students
We will use MySQL as the demonstration database.
This lesson focuses on the usage of SQL statements related to character sets.
1. Character Introducer
That is, for a character string, an explicit character encoding and collation rule are given, which are not affected by system parameters.
Syntax:
1
[_charset_name] 'string' [COLLATE collation_name]
Summary of Introducer Usage Rules:
Character Set
Collation Rule
Character - String Confirmation Rule
Specified
Specified
Confirm according to the specification
Specified
Absent
Confirm according to the default collation rule corresponding to the specified character set
Absent
Specified
Confirm according to the value of @@character_set_connection; an error will be reported if the collation rule does not match the character set
Absent
Absent
Confirm according to the value of @@character_set_connection
2. Character - Set Conversion Functions
2.1 convert Function
The convert function is similar to the introducer, but it can only specify the character set.
For example, use the convert function to convert the encoding of the character string “Go, Beijing!” to utf8mb4. However, the prerequisite is that the character sets before and after the conversion must be compatible.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Correct conversion
mysql>selectconvert("δΈζθ±ζ!"usingutf8mb4);+-------------------------------------------+
|convert("δΈζθ±ζ!"usingutf8mb4)|+-------------------------------------------+
|δΈζθ±ζ!|+-------------------------------------------+
1rowinset(0.00sec)-- Incorrect conversion, character - set encoding is not compatible.
mysql>selectconvert("δΈζθ±ζ!"usinglatin1);+------------------------------------------+
|convert("δΈζθ±ζ!"usinglatin1)|+------------------------------------------+
|?????!|+------------------------------------------+
1rowinset(0.00sec)
2.2 charset Function
It is used to detect the character set of a string. It can detect the character set of the current string in the current session.
Now, let’s execute it again for confirmation. Similar to the introducer, if the collate statement is not specified, the default collation rule corresponding to the character set will be used.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql>setnameslatin1;QueryOK,0rowsaffected(0.00sec)-- Here we can see that all relevant parameter values have been changed.
mysql>select*fromperformance_schema.session_variableswherevariable_namein('character_set_connection','collation_connection','character_set_results','character_set_client');+--------------------------+-------------------+
|VARIABLE_NAME|VARIABLE_VALUE|+--------------------------+-------------------+
|character_set_client|latin1||character_set_connection|latin1||character_set_results|latin1||collation_connection|latin1_swedish_ci|+--------------------------+-------------------+
If you want to change it back to the default value, simply execute:
It should be noted that not all character sets are applicable to this statement. For example, if you set the fixed - length character set utf32, an error will be reported because the variable @@character_set_client does not support this character set.
1
2
mysql>setnamesutf32;ERROR1231(42000):Variable'character_set_client'can't be set to the value of 'utf32'
2.4 set character set Statement
Syntax:
1
2
SET{CHARACTERSET|CHARSET}{'charset_name'|DEFAULT}
Similar to the set names statement, it also sets the following three session parameters:
character_set_results
character_set_client
character_set_connection
It can also restore the default values and has the same restriction rules.
However, there are two differences:
The value of the parameter character_set_connection will not be set to the specified character set. Instead, it will inherit the character set set by the parameter character_set_database.
Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql>setcharactersetlatin1;QueryOK,0rowsaffected(0.01sec)-- The query results show that the value of the parameter character_set_connection is the same as that of character_set_database.
mysql>select*fromperformance_schema.session_variableswherevariable_namein('character_set_connection','collation_connection','character_set_database','character_set_results','character_set_client');+--------------------------+--------------------+
|VARIABLE_NAME|VARIABLE_VALUE|+--------------------------+--------------------+
|character_set_client|latin1||character_set_connection|utf8mb4||character_set_database|utf8mb4||character_set_results|latin1||collation_connection|utf8mb4_0900_ai_ci|+--------------------------+--------------------+
5rowsinset(0.00sec)
2. It is only used to set the character set and cannot define the specific collation rule name. That is, the collation rule name is the default collation rule name corresponding to the character set. This can be seen from the above example.
2.5 collate Clause
The collate statement is used to force - specify the collation rule and has the highest priority. That is, explicitly specifying collate will override the existing collation rule.
This involves the collation rules for single character strings and character - string concatenation.
Explicitly specify the sorting method.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Example table `c1`
mysql>createtablec1(nchar(1));QueryOK,0rowsaffected(0.06sec)-- Insert sample data, inserting English uppercase and lowercase letters in a random order
mysql>insertintoc1withrecursivea(x,y)as(select65,97unionallselectx+1,y+1fromawherex<90)selectchar(xusingascii)xfromaunionallselectchar(yusingascii)yfromaorderbyrand();QueryOK,52rowsaffected(0.02sec)Records:52Duplicates:0Warnings:0
-- Original sorting result
mysql>selectnfromc1orderbyndesclimit6;+------+
|n|+------+
|Z||z||y||Y||x||X|+------+
6rowsinset(0.00sec)-- Sorting result after explicitly specifying `collate`
mysql>selectnfromc1orderbyncollateutf8mb4_0900_bindesclimit6;+------+
|n|+------+
|z||y||x||w||v||u|+------+
6rowsinset(0.00sec)
It can also be used in where, group by, having, etc.
Used for the mandatory indicators of the sorting method
For example, when filtering a character string in the where condition, such as where a = 'a', the question then arises: should the collation rule of a be used, or the collation rule of the character 'a'? Here, it involves a mandatory indicator of the collation rule.
The values of the mandatory indicators of the collation rules in MySQL range from 0 to 6, a total of 7 levels. The smaller the numerical indicator, the higher the priority. The following is an explanation of the indicator values:
Value
Explanation
0
Explicitly specify the collate clause, with the highest priority
1
When concatenating two character strings with different collation rules, the indicator value is 1
2
The indicator value for two different columns, stored - procedure parameters, or local variables is 2
3
The indicator value for system constants is 3. For example, functions like user(), version() etc.
4
For simple character - string literals, the indicator value is 4
5
The indicator value for numbers, dates, etc. is 5
6
The indicator values for NULL, expressions containing NULL, or results that are NULL are all 6
At the same time, to facilitate understanding, MySQL provides the coercibility function to detect this indicator.
-- The `collate` clause indicator is 0
mysql>selectcoercibility('a'collateutf8mb4_bin)as'coercibility';+--------------+
|coercibility|+--------------+
|0|+--------------+
1rowinset(0.00sec)-- The indicator for a simple text is 4
mysql>selectcoercibility('a')as'coercibility';+--------------+
|coercibility|+--------------+
|4|+--------------+
1rowinset(0.00sec)-- The indicator for a date is 5
mysql>selectcoercibility(now())as'coercibility';+--------------+
|coercibility|+--------------+
|5|+--------------+
1rowinset(0.00sec)-- The indicator for a system constant is 3
mysql>selectcoercibility(@@server_uuid)as'coercibility';+--------------+
|coercibility|+--------------+
|3|+--------------+
1rowinset(0.00sec)
Summary
That’s almost all for the introduction of SQL related to character - set control. We mainly introduced, with examples, SQL statements for handling character sets in MySQL. For example, statements for setting client - related encodings: SET NAMES/SET CHARSET; statements for setting collation rules: COLLATE; and the introducer that sends signals to MySQL, etc. Hope it is helpful to everyone.
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.