Lesson 16 of the SQL Optimization Course: Primary Key Design | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


1. Understanding Primary Keys in MySQL

A primary key is a unique identifier for each row in a table. In MySQL, the choice of primary key can significantly impact performance and storage efficiency.

2. Key Considerations for Primary Key Design

2.1 Space Efficiency

The size of the primary key directly affects the amount of storage required. Smaller data types, such as INT32, are preferable as they allow more keys to be stored on each index page, thereby reducing the amount of data that needs to be loaded into memory.

2.2 Ordering Properties

Ideally, primary keys should have an ordering property. Using an INT32 type for the primary key ensures that new records can be appended to the end of the data page or a new page can be added to accommodate them. This ordered insertion minimizes random I/O operations and enhances write performance.

2.3 Data Type Selection

Integer data types are highly recommended for primary keys. They offer the benefits of small size and ordered insertion. For most scenarios, INT32 is sufficient, supporting up to 4.29 billion rows in a single database table. However, if you anticipate the need for database or table sharding in the future, consider using INT64.

3. Handling Sharding Scenarios

When designing for sharding, MySQL provides native support for generating shard-friendly auto-incrementing primary keys. By configuring the auto-increment step and offset, you can ensure unique key values across different shards.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mysql> set @@auto_increment_increment=2;
Query OK, 0 rows affected (0.00 sec)
mysql> set @@auto_increment_offset=1;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into tmp values(null),(null),(null);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from tmp;
+----+
| id |
+----+
| 1 |
| 3 |
| 5 |
+----+
3 rows in set (0.00 sec)

4. Merging Data from Legacy Systems

When merging data from legacy systems, you may encounter primary key conflicts. To address this, you can create a new table with a new auto-incrementing primary key and retain the original ID as a separate column.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mysql> create table n3 (id int auto_increment primary key, old_id int);
Query OK, 0 rows affected (0.07 sec)
mysql> insert into n3 (old_id) select * from n1 union all select * from n2;
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from n3;
+----+--------+
| id | old_id |
+----+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
| 5 | 2 |
| 6 | 3 |
+----+--------+
6 rows in set (0.00 sec)

To establish a mapping relationship between the original table ID and the original table name, you can create a multi-valued index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mysql> create table n4(old_id int, old_name varchar(64),primary key(old_id,old_name));
Query OK, 0 rows affected (0.05 sec)
mysql> insert into n4 select id ,'n1' from n1 union all select id,'n2' from n2;
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from n4;
+--------+----------+
| old_id | old_name |
+--------+----------+
| 1 | n1 |
| 1 | n2 |
| 2 | n1 |
| 2 | n2 |
| 3 | n1 |
| 3 | n2 |
+--------+----------+
6 rows in set (0.00 sec)

The final table structure combines a new auto-increment field as the primary key, the original table ID, and the original table name.

1
mysql> create table n5(id int unsigned auto_increment primary key,old_id int,old_name varchar(64),unique key udx_old_id_old_name (old_id,old_name));

5. UUID and Its Alternatives

UUIDs offer a unique identifier but are inherently unordered and occupy more space. Storing UUIDs as CHAR(36) in MySQL is not recommended due to InnoDB’s characteristics.

5.1 Optimizing UUID Usage

MySQL provides functions like uuid_to_bin to convert UUID strings into a more efficient binary format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
mysql> create table t_binary(id varbinary(16) primary key, r1 int, key idx_r1(r1));
Query OK, 0 rows affected (0.07 sec)
mysql> insert into t_binary values (uuid_to_bin(uuid()),1),(uuid_to_bin(uuid()),2);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from t_binary;
+------------------------------------+------+
| id | r1 |
+------------------------------------+------+
| 0x412234A77DEF11EA9AF9080027C52750 | 1 |
| 0x412236E27DEF11EA9AF9080027C52750 | 2 |
+------------------------------------+------+
2 rows in set (0.00 sec)

5.2 uuid_short() Function

The uuid_short() function generates a globally unique ID of type INT64, which is more efficient than UUIDs. It combines server ID, startup time, and an incremented variable to ensure uniqueness.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql> create table t_uuid_short (id bigint unsigned primary key, r1 int, key idx_r1(r1));
Query OK, 0 rows affected (0.06 sec)
mysql> insert into t_uuid_short values(uuid_short(),1),(uuid_short(),2);
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from t_uuid_short;
+----------------------+------+ | id | r1 |
+----------------------+
| 16743984358464946177 | 1 |
| 16743984358464946178 | 2 |
+----------------------+
2 rows in set (0.00 sec)

6. Performance Comparison

A simple experiment was conducted to compare the write performance of different primary key types.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
DELIMITER $$
CREATE
  PROCEDURE `ytt`.`sp_insert_data`( f_tbname VARCHAR(64), f_number INT UNSIGNED )
    BEGIN
    DECLARE i INT UNSIGNED DEFAULT 0;
     SET @@autocommit=0;
    IF f_tbname = 't_uuid' THEN
      SET @stmt = CONCAT('insert into t_uuid values (uuid(),ceil(rand()*100));');
   ELSEIF f_tbname = 't_binary' THEN
    SET @stmt = CONCAT('insert into t_binary values(uuid_to_bin(uuid()),ceil(rand()*100));');
    ELSEIF f_tbname = 't_uuid_short' THEN
     SET @stmt = CONCAT('insert into t_uuid_short values(uuid_short(),ceil(rand()*100));');
    ELSEIF f_tbname = 't_id' THEN
      SET @stmt = CONCAT('insert into t_id(r1) values(ceil(rand()*100));');
    END IF;
        WHILE i < f_number
    DO
       PREPARE s1 FROM @stmt;
      EXECUTE s1;
      SET i = i + 1;
      IF MOD(i,50) = 0 THEN
       COMMIT;
      END IF;
    END WHILE;
    COMMIT;
    DROP PREPARE s1;
SET @@autocommit=1;
    END$$
     DELIMITER ;

The results confirmed that UUID-based primary keys have the poorest write performance, while uuid_short() offers the best performance.

1
2
3
4
5
6
7
8
mysql> call sp_insert_data('t_uuid',300000);
Query OK, 0 rows affected (5 min 23.33 sec)
mysql> call sp_insert_data('t_binary',300000);
Query OK, 0 rows affected (4 min 48.92 sec)
mysql> call sp_insert_data('t_id',300000);
Query OK, 0 rows affected (3 min 40.38 sec)
mysql> call sp_insert_data('t_uuid_short',300000);
Query OK, 0 rows affected (3 min 9.94 sec)

Using business-related fields as primary keys can lead to complications. Instead, create a separate auto-incrementing primary key and designate the business-related field as a unique index.

1
2
3
4
5
6
mysql> create table n5(
 id int unsigned auto_increment primary key,
 userno int unsigned ,
 unique key udx_userno(userno)
 );
Query OK, 0 rows affected (0.08 sec)

8. Conclusion

By following these best practices for primary key design in MySQL, you can significantly enhance database performance and maintainability.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.