5 Steps to Optimize MySQL Deep Pagination | SQLFlash

Introduction

MySQL deep pagination is a common performance bottleneck in real-world development, particularly evident when paginating through large datasets. Many developers habitually use the LIMIT offset, size syntax to implement pagination. However, when querying large page numbers or accessing data located far back in the result set, this straightforward approach can lead to severe performance issues. Excessively large offsets force the database to scan and discard vast amounts of data, resulting in drastically increased response times and excessive system resource consumption. This article delves into the root causes of deep pagination problems, validates an efficient solution—Deferred Join—through comprehensive experiments, and compares performance differences before and after optimization, offering developers a practical approach to enhancement.

1. Experimental Environment and Data Preparation

1.1 Hardware Configuration

ComponentSpecification
CPU4-core
Memory8GB
Storage1TB SSD
MySQL Version5.7.24

1.2 Table Structure Design

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE TABLE `access_logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) DEFAULT NULL,
`request_url` varchar(500) DEFAULT NULL,
`http_method` varchar(10) DEFAULT NULL,
`response_code` int(11) DEFAULT NULL,
`response_time_ms` int(11) DEFAULT NULL,
`ip_address` varchar(50) DEFAULT NULL,
`user_agent` varchar(200) DEFAULT NULL,
`referrer` varchar(200) DEFAULT NULL,
`created_at` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`log_id`),
KEY `idx_time` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=UTF8MB4

2. Data Generation Method

Generated 2 million test records, sufficient for deep pagination testing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
DELIMITER $$

CREATE PROCEDURE populate_test_data(IN log_count INT)
BEGIN
    DECLARE i INT DEFAULT 1;

    SET i = 1;
    WHILE i <= log_count DO
        INSERT INTO access_logs(user_id, request_url, http_method, response_code, response_time_ms, ip_address, user_agent, referrer)
        VALUES (
            FLOOR(1 + RAND()*user_count),
            CONCAT('/api/v1/resource/', FLOOR(RAND()*1000)),
            ELT(FLOOR(1 + RAND()*4), 'GET','POST','PUT','DELETE'),
            ELT(FLOOR(1 + RAND()*5), 200, 200, 200, 404, 500), 
            FLOOR(RAND()*3000),
            CONCAT('192.168.', FLOOR(RAND()*255), '.', FLOOR(RAND()*255)),
            ELT(FLOOR(1 + RAND()*4), 'Chrome','Safari','Firefox','Edge'),
            ELT(FLOOR(1 + RAND()*4), 'google.com','bing.com','baidu.com','direct')
        );
        SET i = i + 1;
    END WHILE;

END$$
DELIMITER ;

CALL populate_test_data(2000000);

3. Performance Comparison Experiment

3.1 Original SQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT
  log_id,
  user_id,
  request_url,
  ip_address,
  created_at
FROM
  access_logs
ORDER BY
  created_at
LIMIT
  1800000, 10;

3.2 Optimized SQL

We attempted to rewrite using SQLFlash.

Rewritten SQL Query:

1
2
3
4
5
6
7
8
9
SELECT t1.log_id, t1.user_id, t1.request_url, t1.ip_address, t1.created_at
FROM access_logs t1
INNER JOIN (
  SELECT log_id
  FROM access_logs
  ORDER BY created_at
  LIMIT 1800000, 10
) t2 ON t1.log_id = t2.log_id
ORDER BY t1.created_at;

View Detailed Optimization Report

3.3 Performance Metrics Comparison

MetricOriginal SQLOptimized SQLImprovement
Execution Time2.34s0.46s80.3% faster
Rows Scanned1,800,0102099.998% reduction
Back-to-Table Operations1,800,0101099.999% reduction

3.4 SQLFlash Analysis

Compared to the original SQL, the rewritten SQL achieves optimization through covering indexes and split queries:

  1. Reduced Back-to-Table Operations: Subquery scans index without back-to-table, reducing from 1,800,010 to 10 operations
  2. Smaller Physical Scan Range: Higher storage density of index entries significantly reduces scanned pages
  3. Optimized Execution Order: Filters primary keys first then retrieves complete data, avoiding redundant data transfer

4. Execution Plan Deep Analysis

4.1 Original SQL Execution Plan

1
2
3
4
5
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table       | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra          |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
|  1 | SIMPLE      | access_logs | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1987219 |   100.00 | Using filesort |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+

Analysis: The original SQL performs full table scan with in-memory filesort, processing nearly 2 million rows.

4.2 Optimized SQL Execution Plan

1
2
3
4
5
6
7
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
| id | select_type | table       | partitions | type   | possible_keys | key      | key_len | ref       | rows    | filtered | Extra       |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
|  1 | PRIMARY     | <derived2>  | NULL       | ALL    | NULL          | NULL     | NULL    | NULL      | 1800010 |   100.00 | NULL        |
|  1 | PRIMARY     | t1          | NULL       | eq_ref | PRIMARY       | PRIMARY  | 8       | t2.log_id |       1 |   100.00 | NULL        |
|  2 | DERIVED     | access_logs | NULL       | index  | NULL          | idx_time | 6       | NULL      | 1800010 |   100.00 | Using index |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+

Analysis: The optimized execution plan uses index scanning for primary key retrieval followed by efficient primary key matching.

5. Conclusion

Through comparative analysis, we confirm that Deferred Join is an efficient solution for optimizing MySQL deep pagination queries. This approach splits the query into two phases: first uses covering indexes to quickly locate required primary key IDs, then obtains complete row data through primary key association.

Core Advantages:

  1. Drastically reduced back-to-table operations: From 1.8 million to just 10 operations (99.99% reduction)
  2. Lower physical I/O overhead: Significantly reduces disk scan range using high-density index data
  3. Eliminated full table scans: Utilizes index ordering to avoid expensive filesort operations
  4. Strong versatility: Applicable to any deep pagination scenario with indexed sort fields

The Deferred Join method offers low modification cost and wide applicability, making it the preferred solution for MySQL deep pagination performance issues. For practical applications, we recommend implementing this optimization for pagination queries exceeding 100 pages.

Additionally, consider combining with other optimization strategies such as business-level maximum pagination depth limits and cursor-based pagination to build more comprehensive pagination solutions.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.