Introduction
For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.
This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.
🧑💻 Target audience:
- DBA
- Database developers
- Students
We will use MySQL as the demonstration database.
This article explores the concept of MySQL prefix indexes and how they can optimize query performance and storage efficiency.
1. Understanding Prefix Indexes
A prefix index is created by indexing a specified number of characters or bytes from the beginning of a field. MySQL supports prefix indexes for character and binary data types like CHAR/VARCHAR, TEXT/BLOB, and BINARY/VARBINARY.
- For character types, specify the prefix length in characters (e.g., f1(10) indexes the first 10 characters)
- For binary types, specify the prefix length in bytes (e.g., f1(10) indexes the first 10 bytes)
- TEXT/BLOB types only support prefix indexes, not full-field indexing
Consider table t1 with two indexes on field r1: a regular secondary index and a prefix index:
1
2
3
4
5
6
7
8
9
10
11
| <localhost|mysql>show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` varchar(300) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_r1` (`r1`),
KEY `idx_r1_p` (`r1`(6))
) ENGINE=InnoDB AUTO_INCREMENT=32755 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
|
2. Storage Efficiency of Prefix Indexes
The difference in storage requirements between full-field indexes and prefix indexes is significant:
1
2
3
4
5
6
7
| // idx_r1 (full-field index)
root@debian-ytt1:/var/lib/mysql/3306/ytt# du -sh
26M .
// idx_r1_p (prefix index)
root@debian-ytt1:/var/lib/myzsql/3306/ytt# du -sh
20M .
|
When querying records starting with ‘sample’, MySQL chooses the smaller prefix index idx_r1_p:
1
2
3
4
5
6
7
| <localhost|mysql>select count(*) from t1 where r1 like 'sample%';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)
|
The execution plan confirms the prefix index is used:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| <localhost|mysql>explain select count(*) from t1 where r1 like 'sample%'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: range
possible_keys: idx_r1,idx_r1_p
key: idx_r1_p
key_len: 27
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)
|
4. Determining Optimal Prefix Length
To find the best prefix length, calculate the selectivity of each prefix compared to the entire field. This involves:
- Calculating the total selectivity of the field
- Determining the selectivity for different prefix lengths
- Choosing the prefix length that most closely matches the total selectivity
First, calculate the total selectivity of field r1:
1
2
3
4
5
6
7
| <localhost|mysql>SELECT TRUNCATE(COUNT(DISTINCT r1) / COUNT(r1),4) 'total_pct' FROM t1;
+------------+
| total_pct |
+------------+
| 0.0971 |
+------------+
1 row in set (0.13 sec)
|
Then, use a custom function to find the optimal prefix length:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| DELIMITER $$
USE `ytt`$$
DROP FUNCTION IF EXISTS `func_calc_prefix_length`$$
CREATE DEFINER=`ytt`@`%` FUNCTION `func_calc_prefix_length`() RETURNS JSON
BEGIN
DECLARE v_total_pct DECIMAL(20,4);
DECLARE v_prefix_pct DECIMAL(20,4);
DECLARE v_result JSON DEFAULT '[]';
DECLARE i TINYINT DEFAULT 1;
SELECT TRUNCATE(COUNT(DISTINCT r1) / COUNT(r1),4) INTO v_total_pct FROM t1;
label1:LOOP
SELECT TRUNCATE(COUNT(DISTINCT LEFT(r1,i)) / COUNT(r1),4) INTO v_prefix_pct FROM t1;
SET v_result = JSON_ARRAY_APPEND(v_result,'$',JSON_OBJECT(i,v_prefix_pct));
IF v_prefix_pct >= v_total_pct THEN
LEAVE label1;
END IF;
SET i = i + 1;
END LOOP;
RETURN v_result;
END$$
DELIMITER ;
|
Calling this function reveals the optimal prefix length of 6:
1
2
3
4
| <localhost|mysql>SELECT func_calc_prefix_length() AS prefix_length\G
*************************** 1. row ***************************
prefix_length: [{"1": 0.0003}, {"2": 0.0005}, {"3": 0.0008}, {"4": 0.0013}, {"5": 0.0093}, {"6": 0.0971}]
1 row in set (0.32 sec)
|
5. Handling Suffix-Based Queries
For queries with wildcards at the beginning (e.g., LIKE '%sample'
), two optimization strategies exist:
5.1 Strategy 1: Suffix Field Indexing
Create a new table with a suffix field and index it:
1
2
3
4
5
6
7
8
9
| CREATE TABLE `t3` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` varchar(300) DEFAULT NULL,
`suffix_r1` varchar(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_suffix_r1` (`suffix_r1`)
) ENGINE=InnoDB
<localhost|mysql>insert into t3 select id,r1,right(r1,6) from t2;
|
5.2 Strategy 2: Reverse Data Indexing
Create a reversed copy of the data and query using the reversed keyword:
1
| <localhost|mysql>insert into t4 select id,reverse(r1) from t2;
|
Query with reversed keyword:
1
2
3
4
5
6
7
| <localhost|mysql>select count(*) from t4 where r1 like 'elpmas%';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)
|
The execution plan confirms the prefix index is used:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| <localhost|mysql>explain select count(*) from t4 where r1 like 'elpmas%'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t4
partitions: NULL
type: range
possible_keys: idx_r1_p
key: idx_r1_p
key_len: 27
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)
|
By mastering these prefix index techniques, you can significantly enhance your database performance and query efficiency.
👋 See you in the next lesson.
Recommended reading
What is SQLFlash?
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
How to use SQLFlash in a database?