Why Are Reverse Index Scans Slower in InnoDB? | SQLFlash

If you’ve noticed that ORDER BY DESC queries in MySQL are slightly slower than ORDER BY ASC, don’t worry—this is a known and expected behavior. This occurs because InnoDB is designed and optimized for forward scanning, using a unidirectional linked list to organize records on a page. As a result, moving forward (ASC) has a time complexity of O(1), while moving backward (DESC) has a time complexity of O(n). This blog will demonstrate both algorithms from a storage-level perspective.

1. InnoDB Page Structure

1.1 Unidirectional Linked List

InnoDB uses a unidirectional linked list to organize records. Each page has two virtual records: infimum and supremum, which serve as the head and tail of the linked list, respectively. Once a data page contains user records, the linked list reflects the logical order:

1
infimum -> rec1 -> rec2 -> rec3 -> rec4 -> ... -> supremum

1.2 REC_NEXT

Each record reserves 2 bytes in the record header to store the offset to the next record:

1
2
constexpr uint32_t REC_NEXT = 2;
constexpr uint32_t REC_NEXT_MASK = 0xFFFFUL;

For example, the REC_NEXT value of the infimum record is 0x00, 0x0d:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
/** The page infimum and supremum of an empty page in ROW_FORMAT=COMPACT */
static const byte infimum_supremum_compact[] = {
  /* the infimum record */
  0x01 /*n_owned=1*/, 0x00, 0x02 /* heap_no=0, REC_STATUS_INFIMUM */, 0x00,
  0x0d /* pointer to supremum */,
  'i', 'n', 'f', 'i', 'm', 'u', 'm', 0,
  /* the supremum record */
  0x01 /*n_owned=1*/, 0x00, 0x0b /* heap_no=1, REC_STATUS_SUPREMUM */, 0x00,
  0x00 /* end of record list */,
  's', 'u', 'p', 'r', 'e', 'm', 'u', 'm'
};

Using the offset 0x000d from the infimum record, we can locate the supremum record:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
In [1]: infimum_supremum_compact = [
   ...:     0x01, 0x00, 0x02, 0x00,
   ...:     0x0d,
   ...:     'i', 'n', 'f', 'i', 'm', 'u', 'm', 0,
   ...:     0x01, 0x00, 0x0b, 0x00,
   ...:     0x00,
   ...:     's', 'u', 'p', 'r', 'e', 'm', 'u', 'm'
   ...: ]

In [2]: infimum_supremum_compact[5]
Out[2]: 'i'

In [3]: infimum_supremum_compact[5 + 0x000d]
Out[3]: 's'

1.3 Page Directory

Due to the unidirectional linked list structure, scanning the entire list to find a record is inefficient. InnoDB maintains a dynamic array (page directory) at the end of each data page. Each element (slot) in the array stores the position of a record:

1
2
/* We define a slot in the page directory as two bytes */
constexpr uint32_t PAGE_DIR_SLOT_SIZE = 2;

Instead of storing every record’s address, each slot points to the last record in its managed group. A slot typically manages 4 to 8 records:

1
2
3
/* The maximum and minimum number of records owned by a directory slot. */
constexpr uint32_t PAGE_DIR_SLOT_MAX_N_OWNED = 8;
constexpr uint32_t PAGE_DIR_SLOT_MIN_N_OWNED = 4;

The first slot always points to infimum, and the last slot always points to supremum.

1.4 N_OWNED

Each record reserves 4 bits in the record header to store N_OWNED:

1
2
constexpr uint32_t REC_NEW_N_OWNED = 5; /* This is a single byte bit-field */
constexpr uint32_t REC_N_OWNED_MASK = 0xFUL;

If a record is the last in its slot, its value indicates the number of records owned by the slot. Otherwise, the value is 0.


2. Example

The following diagram illustrates the layout of a data page:

  • Orange arrows connect 24 user records from rec0 to rec23.
  • Gray arrows point to the last record managed by each slot:
    • Slot 0 points to infimum (owns 1 record).
    • Slot n points to supremum (owns 5 records).
    • Slot 1 points to rec3 (owns 4 records).

3. Algorithms

We’ll use the following logical InnoDB page layout to understand both scan algorithms:

3.1 Forward Scan

Finding the next record from rec10 is straightforward:

  1. Read the REC_NEXT offset:
    1
    2
    
    field_value = mach_read_from_2(rec - REC_NEXT);
       
  2. Get the next record’s position:
    1
    2
    
    return (ut_align_offset(rec + field_value, UNIV_PAGE_SIZE));
       

3.2 Reverse Scan

Finding the previous record from rec10 is more complex:

3.2.1 Find Which Slot Manages the Current Record (page_dir_find_owner_slot)

  1. Scan records starting from the current record until n_owned is not 0:

    1
    2
    3
    4
    5
    
    while (rec_get_n_owned_new(r) == 0) {
         r = rec_get_next_ptr_const(r, true);
         // ...
       }
       

    • Check rec10 → rec11 (since rec11 has n_owned = 4).
  2. Check all slots to find the one pointing to record r:

    1
    2
    3
    4
    5
    6
    
    rec_offs_bytes = mach_encode_2(r - page);
       while (UNIV_LIKELY(*(uint16 *)slot != rec_offs_bytes)) {
         slot += PAGE_DIR_SLOT_SIZE;
       }
       return (((ulint)(first_slot - slot)) / PAGE_DIR_SLOT_SIZE);
       

    • Scan from the last slot (slot n) to slot 0:
      • slot n points to supremum (not rec11) → check slot 4.
      • slot 4 points to rec15 (not rec11) → check slot 3.
      • slot 3 points to rec11 → return 3.

3.2.2 Scan the Current Slot Group to Find the Previous Record

  1. Jump to the previous slot (since slot 3 only holds the last record, use the previous slot’s group):

    1
    2
    3
    
    slot = page_dir_get_nth_slot(page, slot_no - 1);
       rec2 = page_dir_slot_get_rec(slt);
       

    • Check slot 2 to find rec7.
  2. Scan all records in the slot group to match the current record:

    1
    2
    3
    4
    5
    6
    
    while (rec != rec2) {
         prev_rec = rec2;
         rec2 = page_rec_get_next_low(rec2, true);
       }
       return (prev_rec);
       

    • Check rec7 → rec8 → rec9 → rec10 → return rec9.

4. Time Complexity

  • Forward scan: O(1)
  • Reverse scan: O(n), where n is the number of slots in the page directory.

5. Benchmark Tests

5.1 Forward Scan

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> SELECT k FROM sbtest1 ORDER BY k ASC LIMIT 9999999, 1;
+---------+
| k       |
+---------+
| 8670945 |
+---------+
1 row in set (1.41 sec)

mysql> DESC SELECT k FROM sbtest1 ORDER BY k ASC LIMIT 9999999, 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
   partitions: NULL
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 9864216
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

5.2 Reverse Scan

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> SELECT k FROM sbtest1 ORDER BY k DESC LIMIT 9999999, 1;
+---------+
| k       |
+---------+
| 1184614 |
+---------+
1 row in set (2.01 sec)

mysql> DESC SELECT k FROM sbtest1 ORDER BY k DESC LIMIT 9999999, 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
   partitions: NULL
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 9864216
     filtered: 100.00
        Extra: Backward index scan; Using index
1 row in set, 1 warning (0.00 sec)

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.