Introduction
For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.
This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.
🧑💻 Target audience:
- DBA
- Database developers
- Students
We will use MySQL as the demonstration database.
Problem & Background
I’ve previously written implementation and detailed articles on time-based partitioning. Today I’ll share a real case study of time-based partitioning: optimizing a database table for an internet company.
Challenges:
- Single table grew too large: 100,000 records daily (36.5 million annually)
- 95% of queries filtered data within specific days or date ranges (≤1 month)
- Examples: 2019-03-01, 2019-04-20, or between 2019-05-01 and 2019-05-05
- Rare cross-month/year queries
- 10-year retention policy (3.6 billion records total)
- Poor query performance
- Slow historical data deletion
Analysis Conclusion:
- Query patterns concentrated on recent data
- Need efficient solution for expired data cleanup
Solution Approach:
- Split into 10 tables:
- 1 “current” table for recent data (≤2 years)
- 9 historical archive tables
- Year-end migration of old data to historical tables
- Implement MySQL partitioning for 95% of query scenarios
- Use partition exchange to move data to historical tables
Partitioning Benefits:
- Query performance improvement
- Efficient expired data management
- Application transparency (no code changes needed)
Optimization Process
Simplified Table Structure
Due to privacy concerns, using a simplified example table pt_old
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| (localhost:ytt) SHOW CREATE TABLE pt_old\G
*************************** 1. row ***************************
Table: pt_old
Create Table: CREATE TABLE `pt_old` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` int DEFAULT NULL,
`log_date` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_log_date` (`log_date`)
) ENGINE=InnoDB AUTO_INCREMENT=64306811 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
(localhost:ytt) SELECT MIN(log_date), MAX(log_date), COUNT(*) FROM pt_old;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2011-01-01 | 2020-12-31 | 36500000 |
+---------------+---------------+----------+
1 row in set (21.14 sec)
|
(Note: Original table had 30 fields and 360 million records; simplified to 3 fields and 36.5 million records)
Data Export Script
Export yearly data for import to partitioned tables:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| root@ytt-unbuntu:/home/ytt/scripts# cat pt_export
#!/bin/sh
for i in `seq 2011 2020`
do
{
mysql -D ytt -e "SELECT * FROM pt_old WHERE log_date BETWEEN '$i-01-01' AND '$i-12-31' INTO OUTFILE '/var/lib/mysql-files/pt_$i.csv' FIELDS TERMINATED BY ','"
} &
done
wait
root@ytt-unbuntu:/var/lib/mysql-files# ls -sihl
total 788M
5767677 79M -rw-r----- 1 mysql mysql 79M 2月 4 15:39 pt_2011.csv
...
5775331 79M -rw-r----- 1 mysql mysql 79M 2月 4 15:42 pt_2020.csv
|
Table Creation & Partitioning
Create yearly tables with pt_2020
as partitioned table:
1
| root@ytt-unbuntu:/home/ytt/scripts# for i in `seq 2011 2020`; do mysql -e "USE ytt; CREATE TABLE pt_$i LIKE pt_old;"; done;
|
Key Modification:
Add log_date
to primary key (MySQL partitioning requirement):
1
2
3
| (localhost:ytt) ALTER TABLE pt_2020 DROP PRIMARY KEY, ADD PRIMARY KEY (id, log_date);
Query OK, 0 rows affected (0.29 sec)
Records: 0 Duplicates: 0 Warnings: 0
|
Partitioning Implementation:
Stored procedure for daily partitioning (covers current + previous year):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| DELIMITER $$
USE `ytt`$$
DROP PROCEDURE IF EXISTS `sp_add_partition_pt_current`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_pt_current`(
IN f_year_start YEAR,
IN f_year_end YEAR,
IN f_tbname VARCHAR(64)
)
BEGIN
DECLARE v_days INT UNSIGNED DEFAULT 365;
DECLARE v_year DATE DEFAULT '2011-01-01';
DECLARE v_partition_name VARCHAR(64) DEFAULT '';
DECLARE v_log_date DATE;
DECLARE i,j INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET @stmt_begin = CONCAT('ALTER TABLE ',f_tbname,' PARTITION BY RANGE COLUMNS(log_date)(');
SET i = f_year_start;
WHILE i <= f_year_end DO
SET v_year = CONCAT(i,'-01-01');
SET v_days = DATEDIFF(DATE_ADD(v_year,INTERVAL 1 YEAR),v_year);
SET j = 1;
WHILE j <= v_days DO
SET v_log_date = DATE_ADD(v_year,INTERVAL j DAY);
SET v_partition_name = CONCAT('p',i,'_',LPAD(j,3,'0'));
SET @stmt = CONCAT(@stmt,'PARTITION ',v_partition_name,' VALUES LESS THAN(''',v_log_date,'''),');
SET j = j + 1;
END WHILE;
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SELECT NULL,NULL,NULL INTO @stmt,@stmt_begin,@stmt_end;
END$$
DELIMITER ;
(localhost:ytt) CALL sp_add_partition_pt_current(2020,2021,'pt_2020');
Query OK, 1 row affected (42.11 sec)
|
Data Import & Table Setup
Import data to respective tables:
1
2
3
4
5
6
7
8
9
| root@ytt-unbuntu:/home/ytt/scripts# cat pt_import
#!/bin/sh
for i in `seq 2011 2020`
do
{
mysql -D ytt -e "LOAD DATA INFILE '/var/lib/mysql-files/pt_$i.csv' INTO TABLE pt_$i FIELDS TERMINATED BY ','"
} &
done
wait
|
Set pt_2020
as current table:
1
2
| (localhost:ytt) ALTER TABLE pt_2020 RENAME TO pt_current;
Query OK, 0 rows affected (0.12 sec)
|
Test 1: Single-day query (2020-03-01)
1
2
3
4
5
6
7
8
9
| -- Original table (pt_old)
(localhost:ytt) SELECT * FROM pt_old WHERE log_date = '2020-03-01';
9593 rows in set (1 min 1.70 sec) -- First execution
9593 rows in set (0.03 sec) -- Second execution
-- Partitioned table (pt_current)
(localhost:ytt) SELECT * FROM pt_current WHERE log_date = '2020-03-01';
9593 rows in set (0.02 sec) -- First execution
9593 rows in set (0.01 sec) -- Second execution
|
Test 2: 5-day range query (2020-12-27 to 2020-12-31)
1
2
3
4
5
6
7
8
9
| -- Original table (pt_old)
(localhost:ytt) SELECT * FROM pt_old WHERE log_date IN ('2020-12-27','2020-12-28','2020-12-29','2020-12-30','2020-12-31');
30097 rows in set (2 min 42.21 sec) -- First execution
30097 rows in set (0.13 sec) -- Second execution
-- Partitioned table (pt_current)
(localhost:ytt) SELECT * FROM pt_current WHERE log_date IN ('2020-12-27','2020-12-28','2020-12-29','2020-12-30','2020-12-31');
30097 rows in set (0.07 sec) -- First execution
30097 rows in set (0.01 sec) -- Second execution
|
Maintenance Optimization
Partition Management
Current Partition Status:
1
2
3
4
5
6
7
8
9
10
11
12
| (localhost:ytt) SELECT LEFT(partition_name,5) p, SUM(table_rows) cnt
FROM information_schema.partitions
WHERE table_name = 'pt_current'
GROUP BY LEFT(partition_name,5);
+-------+---------+
| p | cnt |
+-------+---------+
| p2020 | 3641722 |
| p2021 | 0 |
| p_max | 0 |
+-------+---------+
3 rows in set (0.02 sec)
|
Automated Partition Expansion:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| DELIMITER $$
USE `ytt`$$
DROP PROCEDURE IF EXISTS `sp_autoextend_partition_pt_current`$$
CREATE DEFINER=`root`@`%` PROCEDURE `sp_autoextend_partition_pt_current`(
IN f_year YEAR
)
BEGIN
DECLARE v_days INT UNSIGNED DEFAULT 365;
DECLARE v_days_interval DATE DEFAULT '2018-12-31';
DECLARE i INT UNSIGNED DEFAULT 1;
SET @stmt = '';
SET v_days = DATEDIFF(CONCAT(f_year+1,'-01-01'),CONCAT(f_year,'-01-01'));
SET @stmt_begin = 'ALTER TABLE pt_current REORGANIZE PARTITION p_max into(';
WHILE i <= v_days DO
SET v_days_interval = DATE_ADD(CONCAT(f_year,'-01-01'),INTERVAL i DAY);
SET @stmt = CONCAT(@stmt,'PARTITION p',f_year,'_',LPAD(i,3,"0"),' VALUES LESS THAN (''',v_days_interval,'''),');
SET i = i + 1;
END WHILE;
SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue))';
SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
PREPARE s1 FROM @stmt;
EXECUTE s1;
DROP PREPARE s1;
SET @stmt = NULL;
SET @stmt_begin = NULL;
SET @stmt_end = NULL;
END$$
DELIMITER ;
(localhost:ytt) CALL sp_autoextend_partition_pt_current(2022);
Query OK, 0 rows affected (14.55 sec)
|
Year-End Maintenance (2022 Example)
Add test data for 2021:
1
2
3
4
| (localhost:ytt) INSERT INTO pt_current (r1,log_date)
SELECT r1, DATE_ADD(log_date, INTERVAL 1 YEAR)
FROM pt_current;
Query OK, 3641722 rows affected (2 min 28.75 sec)
|
Migrate 2020 data to historical table:
1
2
3
4
5
6
7
| (localhost:ytt) CREATE TABLE pt_2020 LIKE pt_old;
Query OK, 0 rows affected (0.05 sec)
(localhost:ytt) INSERT INTO pt_2020
SELECT * FROM pt_current
WHERE log_date BETWEEN '2020-01-01' AND '2020-12-31';
Query OK, 3641722 rows affected (1 min 12.54 sec)
|
Clean expired partitions:
1
2
3
4
5
6
7
8
9
10
11
| (localhost:ytt) SELECT CONCAT('ALTER TABLE ytt.pt_current DROP PARTITION ',partition_name,';')
FROM information_schema.`PARTITIONS`
WHERE table_schema = 'ytt'
AND table_name = 'pt_current'
AND partition_name LIKE 'p2020%'
INTO OUTFILE '/var/lib/mysql-files/drop_expire_partition_2020.sql';
Query OK, 366 rows affected (0.00 sec)
mysql> \. /var/lib/mysql-files/drop_expire_partition_2020.sql
Query OK, 0 rows affected (0.83 sec) -- Multiple similar outputs
...
|
Important Note: Maintain consistent partition naming conventions for efficient cleanup.
👋 See you in the next lesson.
Recommended reading
What is SQLFlash?
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
How to use SQLFlash in a database?