MySQL OR Condition to UNION ALL | SQLFlash

Preface

In MySQL database query optimization, the performance issue of OR conditions is a common yet easily overlooked performance bottleneck. When the WHERE clause contains multiple OR conditions, especially when these conditions involve indexes on different columns, the MySQL optimizer often struggles to choose the optimal execution plan, leading to a sharp decline in query performance. This performance issue is particularly evident on large-volume tables and may even trigger full table scans or inefficient index merge operations. This article will delve into the root cause of OR condition query performance issues, validate an efficient solution—converting OR conditions to UNION ALL queries—through a complete experiment, and provide a detailed comparison of performance differences before and after optimization. By analyzing execution plans and conducting actual performance tests, it offers developers a practical optimization solution to effectively address the performance problems caused by OR conditions in daily development.

1. Experimental Environment and Data Preparation

1.1 Hardware Configuration

CPU4 cores
Memory8GB
Storage1TB SSD
MySQL Version5.7.24

1.2 Table Structure Design

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CREATE TABLE `access_logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) DEFAULT NULL,
`request_url` varchar(500) DEFAULT NULL,
`http_method` varchar(10) DEFAULT NULL,
`response_code` int(11) DEFAULT NULL,
`response_time_ms` int(11) DEFAULT NULL,
`ip_address` varchar(50) DEFAULT NULL,
`user_agent` varchar(200) DEFAULT NULL,
`referrer` varchar(200) DEFAULT NULL,
`created_at` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`log_id`),
KEY `idx_type` (`http_method`),
KEY `idx_user_id` (`user_id`),
KEY `idx_time` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=UTF8MB4

2. Data Generation Method

Generate 2 million test records.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DELIMITER $$

CREATE PROCEDURE populate_test_data(IN log_count INT)
BEGIN
    DECLARE i INT DEFAULT 1;

    SET i = 1;
    WHILE i <= log_count DO
        INSERT INTO access_logs(user_id, request_url, http_method, response_code, response_time_ms, ip_address, user_agent, referrer)
        VALUES (
            FLOOR(1 + RAND()*user_count),
            CONCAT('/api/v1/resource/', FLOOR(RAND()*1000)),
            ELT(FLOOR(1 + RAND()*4), 'GET','POST','PUT','DELETE'),
            ELT(FLOOR(1 + RAND()*5), 200, 200, 200, 404, 500), 
            FLOOR(RAND()*3000),
            CONCAT('192.168.', FLOOR(RAND()*255), '.', FLOOR(RAND()*255)),
            ELT(FLOOR(1 + RAND()*4), 'Chrome','Safari','Firefox','Edge'),
            ELT(FLOOR(1 + RAND()*4), 'google.com','bing.com','baidu.com','direct')
        );
        SET i = i + 1;
    END WHILE;

END$$
DELIMITER ;

CALL populate_test_data(2000000);

3. Performance Comparison Experiment

3.1 Original SQL

1
2
3
4
5
6
7
SELECT
    count(*)
FROM
    access_logs
WHERE
    user_id = 75259
    OR http_method = "POST"

3.2 Optimized SQL

We attempted to rewrite it using SQLFlash.

SQLFlash Demo

The rewritten SQL obtained is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT
    COUNT(DISTINCT t.log_id)
FROM
(
    SELECT log_id
    FROM access_logs
    WHERE user_id = 75259
    UNION
    SELECT log_id
    FROM access_logs
    WHERE http_method = "POST"
) t

3.3 Performance Metrics Comparison

MetricOriginal SQLOptimized SQL
Execution Time13.36 sec0.62 sec

3.4 SQLFlash Analysis

Compared to the original query using OR, this rewrite splits the conditions into two separate subqueries and combines them with UNION, which may reduce the overall overhead of OR logical evaluation in certain database execution strategies. Each subquery only processes data for its respective condition and uses DISTINCT for deduplication, avoiding mixing multiple column conditions in the same expression. Some optimizers may adopt more flexible strategies (such as early filtering or skipping irrelevant rows) when processing UNION, potentially bringing performance benefits in specific scenarios. This rewrite also makes the logic more intuitive, facilitating subsequent maintenance and testing.

4. In-Depth Execution Plan Analysis

4.1 Original SQL Execution Plan

​

1
2
3
4
5
+----+-------------+-------------+------------+-------------+------------------------------+----------------------+---------+------+--------+----------+------------------------------------------------+
| id | select_type | table       | partitions | type        | possible_keys                | key                  | key_len | ref  | rows   | filtered | Extra                                          |
+----+-------------+-------------+------------+-------------+------------------------------+----------------------+---------+------+--------+----------+------------------------------------------------+
|  1 | SIMPLE      | access_logs | NULL       | index_merge | idx_type,idx_user_id,idx_111 | idx_user_id,idx_type | 9,13    | NULL | 968965 |   100.00 | Using union(idx_user_id,idx_type); Using where |
+----+-------------+-------------+------------+-------------+------------------------------+----------------------+---------+------+--------+----------+------------------------------------------------+

Analysis:​​ The original SQL execution plan shows that MySQL uses the index_merge optimization strategy, specifically manifested as “Using union(idx_user_id,idx_type)”. This strategy simultaneously uses both the idx_user_id and idx_type indexes, then merges and deduplicates the results from both index scans. Although it avoids a full table scan, it needs to process over 960,000 rows of intermediate results, generating significant random I/O operations. The index_merge operation itself requires additional memory and CPU resources to merge the results from two indexes, which is the main reason for the poor query performance.

4.2 Optimized SQL Execution Plan

1
2
3
4
5
6
7
8
+----+--------------+-------------+------------+------+---------------------+-------------+---------+-------+--------+----------+-----------------+
| id | select_type  | table       | partitions | type | possible_keys       | key         | key_len | ref   | rows   | filtered | Extra           |
+----+--------------+-------------+------------+------+---------------------+-------------+---------+-------+--------+----------+-----------------+
|  1 | PRIMARY      | <derived2>  | NULL       | ALL  | NULL                | NULL        | NULL    | NULL  | 968965 |   100.00 | NULL            |
|  2 | DERIVED      | access_logs | NULL       | ref  | idx_user_id         | idx_user_id | 9       | const |     23 |   100.00 | Using index     |
|  3 | UNION        | access_logs | NULL       | ref  | idx_type            | idx_type    | 13      | const | 968942 |   100.00 | Using index     |
|NULL| UNION RESULT | <union2,3>  | NULL       | ALL  | NULL                | NULL        | NULL    | NULL  |   NULL |     NULL | Using temporary |
+----+--------------+-------------+------------+------+---------------------+-------------+---------+-------+--------+----------+-----------------+
​ Analysis:​​ Although it still needs to process over 960,000 rows of data, the optimized solution decomposes the originally complex index_merge operation into two efficient index lookups, each utilizing “Using index” for index scanning to avoid back-to-table queries, which is the key to performance improvement.

5. Applicable Scenarios

  • If the execution plan shows that SQL will use “Using union” optimization, and if MySQL finds that other filtering conditions are more efficient, then it may not be suitable to change OR conditions to UNION.

  • When either side of the OR condition involves scanning a large result set, UNION will be more efficient than index merge.

6. Conclusion

Through comparative analysis in this experiment, we have confirmed that converting OR conditions to UNION ALL is an effective solution for optimizing multi-condition queries in MySQL. This solution decomposes complex OR conditions into multiple independent subqueries, fully utilizes the independent indexes of each field, and avoids the performance loss caused by the MySQL optimizer’s index_merge operations.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.