Introduction
In high-volume data scenarios, performance optimization for UNION queries is particularly critical. When pagination is required for union query results, traditional approaches that apply LIMIT externally often process all intermediate results, leading to unnecessary resource consumption. This article explores pushing LIMIT down to each UNION branch, effectively reducing the scale of intermediate datasets and improving query performance.
I. Test Data Generation (TPC-H Standard Dataset Extension)
Creating Test Tables
1
2
3
4
5
6
7
8
9
10
11
| CREATE TABLE customer(
c_custkey INT PRIMARY KEY,
c_nationkey INT NOT NULL,
c_name VARCHAR(25) NOT NULL
) ENGINE=InnoDB;
CREATE TABLE supplier(
s_suppkey INT PRIMARY KEY,
s_nationkey INT NOT NULL,
s_name VARCHAR(25) NOT NULL
) ENGINE=InnoDB;
|
Batch Data Insertion
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| -- Insert 10 million customer records (country key randomly distributed from 1 to 50)
INSERT INTO customer
WITH RECURSIVE seq AS(
SELECT 1 AS n UNION ALL SELECT n+1 FROM seq WHERE n<10000000
)
SELECT
n,
FLOOR(RAND()*50)+1,
CONCAT('Customer#',LPAD(n,10,'0'))
FROM seq;
-- Insert 2 million supplier records (country key randomly distributed from 1 to 50)
INSERT INTO supplier
WITH RECURSIVE seq AS(
SELECT 1 AS n UNION ALL SELECT n+1 FROM seq WHERE n<2000000
)
SELECT
n,
FLOOR(RAND()*50)+1,
CONCAT('Supplier#',LPAD(n,10,'0'))
FROM seq;
|
Creating Index
1
2
| CREATE INDEX idx_customer_nation ON customer(c_nationkey);
CREATE INDEX idx_supplier_nation ON supplier(s_nationkey);
|
II. SQL Optimization
Original SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
| -- Original SQL (Retrieve Records 21-30)
SELECT *
FROM (
SELECT c_nationkey nation, 'C' type, COUNT(*) num
FROM customer
GROUP BY c_nationkey
UNION
SELECT s_nationkey nation, 'S' type, COUNT(*) num
FROM supplier
GROUP BY s_nationkey
) AS combined
ORDER BY nation
LIMIT 20,10;
|
Optimizing SQL: Rewriting with SQLFlash
Attempting to refactor query logic through SQLFlash’s real-time semantic translation capabilities.

The rewritten SQL is as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| SELECT *
FROM (
(SELECT c_nationkey nation, 'C' type, COUNT(*) num
FROM customer
GROUP BY c_nationkey
ORDER BY c_nationkey LIMIT 30)
UNION
(SELECT s_nationkey nation, 'S' type, COUNT(*) num
FROM supplier
GROUP BY s_nationkey
ORDER BY s_nationkey LIMIT 30)
) AS combined
ORDER BY nation
LIMIT 20,10;
|
View Detailed Report
Test Environment
- Database Version: MySQL 8.0.28
- Hardware Configuration: 4-core CPU / 8GB RAM / SSD
- Data Scale:
- customer table: 10 million rows (50 countries)
- supplier table: 2 million rows (50 countries)
Test Results
Metric | Traditional Writing (External LIMIT) | Optimized Writing (LIMIT Pushdown) | Performance Improvement |
---|
Execution Time | 1.25 seconds | 0.76 seconds | 40% |
Scanned Rows | More than 12 million rows | 60 rows | - |
Temporary Table Size | Very large | Small | - |
SQLFlash Analysis
Compared to the original SQL, the rewritten query performs sorting and limits the output rows (LIMIT 30) immediately after each subquery completes grouping. This significantly reduces the size of the result sets generated by each subquery. After the UNION operation, only a smaller number of rows need to be sorted and paginated to obtain the final results. This approach effectively alleviates the burden on the final sorting phase and minimizes unnecessary sorting work after merging results, thereby reducing overall execution time and memory usage. For query scenarios, early sorting and row-limiting on grouped results can significantly lower the processing cost of subsequent UNION and sorting operations, improving query performance and response speed.
Bottlenecks in the Traditional Approach
- Full data processing: Each UNION branch generates aggregated results for all 50 countries (100 records in total).
- Double sorting overhead: Performs a full sort on the UNION result set first, then applies LIMIT pagination.
- Memory pressure: Requires maintaining 12 million intermediate rows in memory.
Advantages of the Optimized Solution
- Early data filtering: Each branch retrieves only 30 records (meeting the final requirement of 30 results).
- High index utilization: Uses the (c_nationkey) index to directly obtain ordered results.
- Reduced Resource Consumption:
- The temporary table contains only 60 rows, resulting in lower performance overhead.
- By utilizing an index (type = index) and processing a small data volume, disk I/O and CPU consumption are significantly reduced.
Execution Plan Comparison
Original SQL Execution Plan:
1
2
3
4
5
6
7
8
9
10
| mysql> explain SELECT * FROM ( SELECT c_nationkey nation, 'C' type, COUNT(*) num FROM customer GROUP BY c_nationkey UNION SELECT s_nationkey nation, 'S' type, COUNT(*) num FROM supplier GROUP BY s_nationkey ) AS combined ORDER BY nation LIMIT 20,10;
+----+--------------+------------+------------+-------+---------------------+---------------------+---------+------+----------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+------------+-------+---------------------+---------------------+---------+------+----------+----------+-----------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 11963874 | 100.00 | Using filesort |
| 2 | DERIVED | customer | NULL | index | idx_customer_nation | idx_customer_nation | 4 | NULL | 9969732 | 100.00 | Using index |
| 3 | UNION | supplier | NULL | index | idx_supplier_nation | idx_supplier_nation | 4 | NULL | 1994142 | 100.00 | Using index |
| NULL | UNION RESULT | <union2,3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+------------+-------+---------------------+---------------------+---------+------+----------+----------+-----------------+
4 rows in set, 1 warning (0.00 sec)
|
Optimized Execution Plan:
1
2
3
4
5
6
7
8
9
10
| mysql> explain SELECT * FROM ( (SELECT c_nationkey nation, 'C' type, COUNT(*) num FROM customer GROUP BY c_nationkey ORDER BY c_nationkey LIMIT 30) UNION (SELECT s_nationkey nation, 'S' type, COUNT(*) num FROM supplier GROUP BY s_nationkey ORDER BY s_nationkey LIMIT 30) ) AS combined ORDER BY nation LIMIT 20,10;
+----+--------------+------------+------------+-------+---------------------+---------------------+---------+------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+------------+-------+---------------------+---------------------+---------+------+------+----------+-----------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 60 | 100.00 | Using filesort |
| 2 | DERIVED | customer | NULL | index | idx_customer_nation | idx_customer_nation | 4 | NULL | 30 | 100.00 | Using index |
| 3 | UNION | supplier | NULL | index | idx_supplier_nation | idx_supplier_nation | 4 | NULL | 30 | 100.00 | Using index |
| NULL | UNION RESULT | <union2,3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+------------+-------+---------------------+---------------------+---------+------+------+----------+-----------------+
4 rows in set, 1 warning (0.00 sec)
|
V. Conclusion
Through LIMIT pushdown optimization, the following can be achieved:
- Performance Leap: 40% improvement in query speed.
- Resource Savings: The temporary table contains only 60 rows, resulting in lower performance overhead.
- Enhanced Scalability: Slower performance degradation as pagination depth increases.
Final Recommendations: For paginated queries involving UNION/UNION ALL, prioritize pushing LIMIT down to each branch, while ensuring:
- Create appropriate indexes on sorting columns.
- Calculate branch LIMIT values (considering the sum of OFFSET and page size).
- Use UNION ALL instead of UNION to eliminate deduplication overhead (when business requirements allow).
Recommended reading
What is SQLFlash?
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
How to use SQLFlash in a database?