Lesson 20 of the SQL Optimization Course: Composite Indexes | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


When to Use Composite Indexes in SQL Server

Composite indexes (also known as compound indexes) are indexes created on multiple columns. They are particularly useful in business scenarios where multiple columns are used together in queries or where a contiguous subset of columns from left to right is utilized.

Similar to single-column indexes, each key value in a composite index is sorted according to a specific order. For example, a composite index on three fields can have various combinations:

1
2
3
4
5
6
(f1, f2, f3)
(f1, f2, f3 desc)
(f1, f2 desc, f3)
(f1 desc, f2, f3)
(f1 desc, f2 desc, f3 desc)

In this discussion, we focus on composite indexes based on the default sorting method, which is (f1, f2, f3), equivalent to (f1 asc, f2 asc, f3 asc).

Syntax for Composite Indexes

1
alter table t1 add key idx_multi(f1 [asc/desc], f2 [asc/desc], f3 [asc/desc]) [using btree/using hash]

In MySQL, composite indexes can support up to 16 columns. They can be based on B+ trees or hash structures. This article primarily discusses B-tree indexes with the default ascending order. Hash indexes have a specific use case where all columns must be used with equality filters [as shown in SQL 3 below].

Essential Conditions for Using Composite Indexes

The first column (leftmost column) of the composite index must be present in the SQL statement’s filter conditions. Ideally, this column should be used with equality filters.

Consider the following 15 SQL statements, which apply different combinations of filters on the f1, f2, and f3 fields of table t1. All statements include column f1, thus meeting the essential condition for using a composite index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# SQL 1
select * from t1 where f1 = 1;

# SQL 2
select * from t1 where f1 = 1 and f2 = 1;

# SQL 3
select * from t1 where f1 = 1 and f2 = 1 and f3 = 1;

# SQL 4
select f1, f2 from t1 where 1 order by f1, f2;

# SQL 5
select f1, f2, f3 from t1 where 1 order by f1, f2, f3;

# SQL 6
select f1, f2, count(*) from t1 group by f1, f2;

# SQL 7
select f1, f2, f3, count(*) from t1 group by f1, f2, f3;

# SQL 8
select * from t1 where f1 = 10 and f2 = 5 and f3 > 10;

# SQL 9
select * from t1 where f1 = 10 and f2 > 5;

# SQL 10
select * from t1 where f1 < 10;

# SQL 11
select * from t1 where f1 < 10 and f2 > 5;

# SQL 12
select * from t1 where f1 < 10 and f2 > 5 and f3 < 10;

# SQL 13
select * from t1 where f1 < 10 and f2 = 5 and f3 < 10;

# SQL 14
select * from t1 where f1 < 10 and f2 = 5 and f3 = 10;

# SQL 15
select * from t1 where f1 = 1 and f3 = 1;

SQL 1, SQL 2, and SQL 3 utilize the composite index idx_multi for filtering and then access the table. SQL 3 represents the most optimal query where all fields in the composite index can be used for filtering. Let’s examine the execution plan for SQL 3:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain  select * from t1 where f1 = 1 and f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_multi
          key: idx_multi
      key_len: 15
          ref: const,const,const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

The execution plan for SQL 3 is nearly perfect. The most optimal scenario would be if idx_multi were a primary key or a unique index. For example, when querying table t3, which has a unique index udx_multi:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain  select * from t3 where f1 = 9 and f2 = 52 and f3 = 35\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t3
   partitions: NULL
         type: const
possible_keys: udx_multi
          key: udx_multi
      key_len: 15
          ref: const,const,const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

SQL 4, SQL 5, SQL 6, and SQL 7 utilize covering index scans, avoiding table access and leveraging the idx_multi index to return results in ascending order. Let’s review the execution plan for SQL 7:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
127.0.0.1:3400)|(ytt)>explain
    -> select f1, f2, f3, count(*) from t1 group by f1, f2, f3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index
possible_keys: idx_multi
          key: idx_multi
      key_len: 15
          ref: NULL
         rows: 32194
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

The type column shows “index,” indicating that the index directly returns the results.

A common question arises: For SQL 1 (filtering only on f1) and SQL 2 (filtering on f1 and f2), should we create separate single-column indexes to improve query performance?

1
2
3
alter table t1 add key idx_f1(f1);

alter table t1 add key idx_multi_sub(f1, f2);

Creating a separate index on f1 is unnecessary because f1 is the first field in the idx_multi index. When a query includes only f1, MySQL will only use the index data for f1 and will not utilize all columns of idx_multi.

Similarly, creating an additional composite index on (f1, f2) is not required, as (f1, f2) is essentially a subset of the composite index ((f1, f2), f3).

Let’s verify this by examining the execution plans for SQL 2 using idx_multi and idx_multi_sub:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 and f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_multi,idx_f1,idx_multi_sub
          key: idx_multi
      key_len: 10
          ref: const,const
         rows: 3
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

(127.0.0.1:3400)|(ytt)>explain select * from t1 force index (idx_multi_sub) where f1 = 1 and f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_multi_sub
          key: idx_multi_sub
      key_len: 10
          ref: const,const
         rows: 3
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

The execution plans show no differences, confirming that when query filters match the first column or a contiguous prefix of a composite index, there’s no need to create additional partial indexes. The original composite index suffices.

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 drop key idx_multi_sub, drop key idx_f1;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now let’s examine SQL 8. The first two columns are contiguous, but the filter condition on f3 is a range. Let’s review the execution plan for SQL 8:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 10 and f2 = 5 and f3 > 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: range
possible_keys: idx_multi
          key: idx_multi
      key_len: 15
          ref: NULL
         rows: 154
     filtered: 100.00
        Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

The execution plan indicates that even though f3 has a range filter, the composite index is still utilized because the first two columns (f1 and f2) are contiguous.

SQL 9 is similar to SQL 8, with f1 using equality filtering followed by a range filter on f2.

SQL 10 only utilizes column f1, similar to a single-column index on f1.

SQL 11 and SQL 12 are similar to SQL 10. Although the filter order matches the index order, the first column is a range filter, so only the first column of the composite index is used.

SQL 13 and SQL 14 are different. Although f1 uses a range filter, SQL 13 has equality filtering on f2, and SQL 14 has equality filtering on both f2 and f3.

If SQL 13 and SQL 14 are executed frequently, consider adding a composite index on (f2, f3):

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_multi2(f2, f3);
Query OK, 0 rows affected (0.38 sec)
Records: 0  Duplicates: 0  Warnings: 0

Let’s review the execution plans for SQL 13 and SQL 14:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(127.0.0.1:3400)|(ytt)>explain  select * from t1 where f1 < 10 and f2 = 5 and f3 < 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: range
possible_keys: idx_multi,idx_multi2
          key: idx_multi2
      key_len: 10
          ref: NULL
         rows: 174
     filtered: 7.99
        Extra: Using index condition; Using where
1 row in set, 1 warning (0.00 sec)

(127.0.0.1:3400)|(ytt)>explain  select * from t1 where f1 < 10 and f2 = 5 and f3 = 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_multi,idx_multi2
          key: idx_multi2
      key_len: 10
          ref: const,const
         rows: 14
     filtered: 7.99
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

These SQL statements now return to the classic usage scenario of composite indexes.

Finally, let’s examine SQL 15. The filter condition (f1=1 and f3=1) does not match the contiguous filtering characteristic of the composite index. However, since f1 uses equality filtering, the composite index idx_multi can still be utilized. Let’s review the execution plan:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain  select * from t1 where f1 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_multi
          key: idx_multi
      key_len: 5
          ref: const
         rows: 312
     filtered: 10.00
        Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

The execution plan shows that only column f1 is used because f2 is not present. If this SQL statement is executed frequently, consider creating a new composite index on (f1, f3):

1
2
3
(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_multi3(f1, f3);
Query OK, 0 rows affected (0.36 sec)
Records: 0  Duplicates: 0  Warnings: 0

Reviewing the execution plan again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(127.0.0.1:3400)|(ytt)>explain  select * from t1 where f1 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_multi,idx_multi3
          key: idx_multi3
      key_len: 10
          ref: const,const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

This returns to the contiguous filtering characteristic scenario of composite indexes.

The table structures used in the examples are as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
(127.0.0.1:3400)|(ytt)>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `f1` int DEFAULT NULL,
  `f2` int DEFAULT NULL,
  `f3` int DEFAULT NULL,
  `f4` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_multi` (`f1`,`f2`,`f3`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

(127.0.0.1:3400)|(ytt)>show create table t3\G
*************************** 1. row ***************************
       Table: t3
Create Table: CREATE TABLE `t3` (
  `id` int NOT NULL,
  `f1` int DEFAULT NULL,
  `f2` int DEFAULT NULL,
  `f3` int DEFAULT NULL,
  `f4` int DEFAULT NULL,
  UNIQUE KEY `udx_multi` (`f1`,`f2`,`f3`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

This discussion covers various usage scenarios for composite indexes. In daily business operations, if a column is already the first column in a composite index, it is generally unnecessary to create additional single-column indexes.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.