If you’ve noticed that ORDER BY DESC queries in MySQL are slightly slower than ORDER BY ASC, don’t worry—this is a known and expected behavior. This occurs because InnoDB is designed and optimized for forward scanning, using a unidirectional linked list to organize records on a page. As a result, moving forward (ASC) has a time complexity of O(1), while moving backward (DESC) has a time complexity of O(n). This blog will demonstrate both algorithms from a storage-level perspective.
1. InnoDB Page Structure
1.1 Unidirectional Linked List
InnoDB uses a unidirectional linked list to organize records. Each page has two virtual records: infimum and supremum, which serve as the head and tail of the linked list, respectively. Once a data page contains user records, the linked list reflects the logical order:
1
| infimum -> rec1 -> rec2 -> rec3 -> rec4 -> ... -> supremum
|
1.2 REC_NEXT
Each record reserves 2 bytes in the record header to store the offset to the next record:
1
2
| constexpr uint32_t REC_NEXT = 2;
constexpr uint32_t REC_NEXT_MASK = 0xFFFFUL;
|
For example, the REC_NEXT value of the infimum record is 0x00, 0x0d:
1
2
3
4
5
6
7
8
9
10
11
| /** The page infimum and supremum of an empty page in ROW_FORMAT=COMPACT */
static const byte infimum_supremum_compact[] = {
/* the infimum record */
0x01 /*n_owned=1*/, 0x00, 0x02 /* heap_no=0, REC_STATUS_INFIMUM */, 0x00,
0x0d /* pointer to supremum */,
'i', 'n', 'f', 'i', 'm', 'u', 'm', 0,
/* the supremum record */
0x01 /*n_owned=1*/, 0x00, 0x0b /* heap_no=1, REC_STATUS_SUPREMUM */, 0x00,
0x00 /* end of record list */,
's', 'u', 'p', 'r', 'e', 'm', 'u', 'm'
};
|
Using the offset 0x000d from the infimum record, we can locate the supremum record:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| In [1]: infimum_supremum_compact = [
...: 0x01, 0x00, 0x02, 0x00,
...: 0x0d,
...: 'i', 'n', 'f', 'i', 'm', 'u', 'm', 0,
...: 0x01, 0x00, 0x0b, 0x00,
...: 0x00,
...: 's', 'u', 'p', 'r', 'e', 'm', 'u', 'm'
...: ]
In [2]: infimum_supremum_compact[5]
Out[2]: 'i'
In [3]: infimum_supremum_compact[5 + 0x000d]
Out[3]: 's'
|
1.3 Page Directory
Due to the unidirectional linked list structure, scanning the entire list to find a record is inefficient. InnoDB maintains a dynamic array (page directory) at the end of each data page. Each element (slot) in the array stores the position of a record:
1
2
| /* We define a slot in the page directory as two bytes */
constexpr uint32_t PAGE_DIR_SLOT_SIZE = 2;
|
Instead of storing every record’s address, each slot points to the last record in its managed group. A slot typically manages 4 to 8 records:
1
2
3
| /* The maximum and minimum number of records owned by a directory slot. */
constexpr uint32_t PAGE_DIR_SLOT_MAX_N_OWNED = 8;
constexpr uint32_t PAGE_DIR_SLOT_MIN_N_OWNED = 4;
|
The first slot always points to infimum, and the last slot always points to supremum.
1.4 N_OWNED
Each record reserves 4 bits in the record header to store N_OWNED:
1
2
| constexpr uint32_t REC_NEW_N_OWNED = 5; /* This is a single byte bit-field */
constexpr uint32_t REC_N_OWNED_MASK = 0xFUL;
|
If a record is the last in its slot, its value indicates the number of records owned by the slot. Otherwise, the value is 0.
2. Example
The following diagram illustrates the layout of a data page:
.webp)
- Orange arrows connect 24 user records from
rec0 to rec23. - Gray arrows point to the last record managed by each slot:
- Slot
0 points to infimum (owns 1 record). - Slot
n points to supremum (owns 5 records). - Slot
1 points to rec3 (owns 4 records).
3. Algorithms
We’ll use the following logical InnoDB page layout to understand both scan algorithms:
.webp)
3.1 Forward Scan
Finding the next record from rec10 is straightforward:
- Read the
REC_NEXT offset:1
2
| field_value = mach_read_from_2(rec - REC_NEXT);
|
- Get the next record’s position:
1
2
| return (ut_align_offset(rec + field_value, UNIV_PAGE_SIZE));
|
3.2 Reverse Scan
Finding the previous record from rec10 is more complex:
3.2.1 Find Which Slot Manages the Current Record (page_dir_find_owner_slot)
Scan records starting from the current record until n_owned is not 0:
1
2
3
4
5
| while (rec_get_n_owned_new(r) == 0) {
r = rec_get_next_ptr_const(r, true);
// ...
}
|
- Check
rec10 → rec11 (since rec11 has n_owned = 4).
Check all slots to find the one pointing to record r:
1
2
3
4
5
6
| rec_offs_bytes = mach_encode_2(r - page);
while (UNIV_LIKELY(*(uint16 *)slot != rec_offs_bytes)) {
slot += PAGE_DIR_SLOT_SIZE;
}
return (((ulint)(first_slot - slot)) / PAGE_DIR_SLOT_SIZE);
|
- Scan from the last slot (
slot n) to slot 0:slot n points to supremum (not rec11) → check slot 4.slot 4 points to rec15 (not rec11) → check slot 3.slot 3 points to rec11 → return 3.
3.2.2 Scan the Current Slot Group to Find the Previous Record
Jump to the previous slot (since slot 3 only holds the last record, use the previous slot’s group):
1
2
3
| slot = page_dir_get_nth_slot(page, slot_no - 1);
rec2 = page_dir_slot_get_rec(slt);
|
- Check
slot 2 to find rec7.
Scan all records in the slot group to match the current record:
1
2
3
4
5
6
| while (rec != rec2) {
prev_rec = rec2;
rec2 = page_rec_get_next_low(rec2, true);
}
return (prev_rec);
|
- Check
rec7 → rec8 → rec9 → rec10 → return rec9.
4. Time Complexity
- Forward scan: O(1)
- Reverse scan: O(n), where
n is the number of slots in the page directory.
5. Benchmark Tests
5.1 Forward Scan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| mysql> SELECT k FROM sbtest1 ORDER BY k ASC LIMIT 9999999, 1;
+---------+
| k |
+---------+
| 8670945 |
+---------+
1 row in set (1.41 sec)
mysql> DESC SELECT k FROM sbtest1 ORDER BY k ASC LIMIT 9999999, 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: index
possible_keys: NULL
key: k_1
key_len: 4
ref: NULL
rows: 9864216
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)
|
5.2 Reverse Scan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| mysql> SELECT k FROM sbtest1 ORDER BY k DESC LIMIT 9999999, 1;
+---------+
| k |
+---------+
| 1184614 |
+---------+
1 row in set (2.01 sec)
mysql> DESC SELECT k FROM sbtest1 ORDER BY k DESC LIMIT 9999999, 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: index
possible_keys: NULL
key: k_1
key_len: 4
ref: NULL
rows: 9864216
filtered: 100.00
Extra: Backward index scan; Using index
1 row in set, 1 warning (0.00 sec)
|
Recommended reading
What is SQLFlash?
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
How to use SQLFlash in a database?