Lesson 21 of the SQL Optimization Course: Composite Indexes 2 | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


When Should You Avoid Composite Indexes in MySQL?

The previous article delved into the concept of MySQL composite indexes and their applicable scenarios. This piece focuses on situations where composite indexes may not be suitable and offers corresponding optimization strategies.

Reviewing the Syntax and Prerequisites of Composite Indexes

Syntax of Composite Indexes (default ascending order):

1
alter table t1 add idx_multi(f1, f2, f3);

Prerequisites: The column f1 must be present in the SQL statement’s filtering conditions!

In other words, the first column of a composite index (the leftmost column) must exist in the filtering conditions. Ideally, it should be used for equality filtering. Consider the following SQL statements, none of which are suitable for composite indexes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# SQL 1
select * from t1 where f2 = 1;

# SQL 2
select * from t1 where f3 = 1;

# SQL 3
select * from t1 where f2 = 1 and f3 = 1;

# SQL 4
select * from t1 where f2 = 1 or f3 = 1;

# SQL 5
select * from t1 where f1 = 1 or f2 = 1;

# SQL 6
select * from t1 where f1 = 1 or f3 = 1;

# SQL 7
select * from t1 where f1 = 1 or f2 = 1 or f3 = 1;

Let’s analyze each SQL statement one by one:

SQL 1

The filtering condition only includes column f2, which is the second column in the composite index idx_multi. Since the index entry cannot be found through column f2, the index cannot be directly utilized. For such statements, a single-column index should be created on column f2.

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f2(f2);
Query OK, 0 rows affected (0.32 sec)
Records: 0  Duplicates: 0  Warnings: 0

Comparison of execution plans before and after adding the index:

Before adding idx_f2, a full table scan occurred with no available indexes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 32194
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

After adding idx_f2, the index is directly used to filter records:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_f2
          key: idx_f2
      key_len: 5
          ref: const
         rows: 351
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

SQL 2

Similar to SQL 1, a single-column index can be created on column f3.

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f3(f3);
Query OK, 0 rows affected (0.35 sec)
Records: 0  Duplicates: 0  Warnings: 0

SQL 3

The filtering condition for SQL 3 involves columns (f2, f3) with equality filtering. However, it does not meet the essential characteristics of the composite index idx_multi as column f1 is not included in the filtering conditions. Previously, single-column indexes were created for columns f2 and f3. MySQL’s Index Merge feature can be leveraged to combine these two indexes and take their intersection. The execution plan is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_f2,idx_f3
          key: idx_f3,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using intersect(idx_f3,idx_f2); Using where
1 row in set, 1 warning (0.00 sec)

It can be observed that MySQL utilized Index Merge to filter data. However, if such statements occur frequently, it is advisable to add a composite index that only includes columns f2 and f3.

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_multi_sub(f2,f3);
Query OK, 0 rows affected (0.50 sec)
Records: 0  Duplicates: 0  Warnings: 0

The execution plan is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_f2,idx_f3,idx_multi_sub
          key: idx_multi_sub
      key_len: 10
          ref: const,const
         rows: 2
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

It can be seen that MySQL chose the index idx_multi_sub to filter data, and the type is ref, which is more optimized compared to index_merge.

SQL 4

The filtering condition for SQL 4 involves columns f2 and f3, but the relationship between them is OR instead of AND, and column f1 is not included. This does not align with the characteristics of a composite index.

For such statements, if the filtering performance of the given conditions for columns f2 and f3 is acceptable, single-column indexes can be created for both columns. MySQL can use index_merge to combine the two single-column indexes for filtering. The execution plan is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_f2,idx_f3,idx_multi_sub
          key: idx_f2,idx_f3
      key_len: 5,5
          ref: NULL
         rows: 684
     filtered: 100.00
        Extra: Using union(idx_f2,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

Some suggest that replacing OR with UNION ALL can avoid such operations. However, this depends on the context. For the above SQL statement, index_merge simply performs a union of indexes idx_f2 and idx_f3 without sorting. In this case, the performance difference between OR and UNION ALL is minimal. Further optimization of OR and UNION ALL will be covered in a separate article and will not be elaborated here.

SQL 5 and SQL 6

These two SQL statements can utilize the composite index idx_multi because column f1 is included in the filtering conditions. However, since the other conditions are OR instead of AND, single-column indexes should also be created for the columns after OR. This way, MySQL can perform an index_merge between idx_multi and other single-column indexes to optimize the query. Let’s examine the execution plans for these two SQL statements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_multi_sub
          key: idx_multi,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 663
     filtered: 100.00
        Extra: Using sort_union(idx_multi,idx_f2); Using where
1 row in set, 1 warning (0.01 sec)

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f3
          key: idx_multi,idx_f3
      key_len: 5,5
          ref: NULL
         rows: 645
     filtered: 100.00
        Extra: Using sort_union(idx_multi,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

It can be observed that MySQL utilized both idx_multi and idx_f2, as well as idx_multi and idx_f3. However, the index_merge in both execution plans used sort_union. Can sort_union be replaced with a regular non-sorting union? To achieve this, a single-column index should be added to column f1.

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f1(f1);
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

Let’s randomly check the execution plan for SQL 5:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_multi_sub,idx_f1
          key: idx_f1,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 663
     filtered: 100.00
        Extra: Using union(idx_f1,idx_f2); Using where
1 row in set, 1 warning (0.00 sec)

After adding a single-column index to column f1, index_merge avoids sort_union.

SQL 7

Similar to the previous two SQL statements, SQL 7 requires the use of a single-column index on column f1. The difference is that index_merge involves three columns instead of two. The execution plan is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_f3,idx_multi_sub,idx_f1
          key: idx_f1,idx_f2,idx_f3
      key_len: 5,5,5
          ref: NULL
         rows: 996
     filtered: 100.00
        Extra: Using union(idx_f1,idx_f2,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

It can be clearly seen that MySQL utilized index_merge with three single-column indexes.

At this point, table t1 has numerous indexes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
(127.0.0.1:3400)|(ytt)>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `f1` int DEFAULT NULL,
  `f2` int DEFAULT NULL,
  `f3` int DEFAULT NULL,
  `f4` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_multi` (`f1`,`f2`,`f3`),
  KEY `idx_f2` (`f2`),
  KEY `idx_f3` (`f3`),
  KEY `idx_multi_sub` (`f2`,`f3`),
  KEY `idx_f1` (`f1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

From the above table structure, it can be seen that, in addition to the primary key, five secondary indexes have been created for these seven SQL statements. In practical scenarios, it is not necessary to create all these indexes. The decision should be based on the frequency of the SQL statements. If a particular SQL statement has a low execution frequency and its allowed execution time does not conflict with business peak hours, the corresponding index can be removed to reduce the latency caused by indexes on table writes.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.