Optimize MySQL OR Conditions: A UNION Rewrite Guide | SQLFlash

I. Preface

In complex join query scenarios, OR logic within JOIN conditions often becomes a performance bottleneck. When the ON clause of a LEFT JOIN contains multi-column OR conditions, the MySQL optimizer often cannot effectively utilize indexes, leading to full table scans or the “Range checked for each record” strategy, causing a sharp performance decline. This solution is based on an actual business scenario, using 20,000-level test data, comparing execution plans, and analyzing cost metrics to validate the significant effectiveness of the UNION rewrite technique for optimizing OR conditions. Test results show that by splitting the OR condition into independent queries and merging them with UNION, performance improved by up to 96.02%, execution time dropped from 19.62 seconds to 0.78 seconds, and execution cost was reduced by 99.88%. This article will deeply analyze the optimization principles, applicable boundaries, and implementation key points, providing developers with actionable performance optimization guidance.

II. Test Environment Setup

2.1 Table Structure Design

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-- Vehicle base information table (10,000 records)

CREATE TABLE rs_prd_vcl_base (
    code VARCHAR(100) PRIMARY KEY,
    product_model_no VARCHAR(50),
    product_model_name VARCHAR(100)
) ENGINE=InnoDB;

-- Vehicle acceptance record table (10,000 records)

CREATE TABLE rs_vehicle_check (
    id INT AUTO_INCREMENT PRIMARY KEY,
    approval_status VARCHAR(20),
    approval_node VARCHAR(50),
    acceptance_status VARCHAR(20),
    shipment_no VARCHAR(50),
    shipment_date DATE,
    product VARCHAR(50),
    rs_product_code VARCHAR(100),
    config_combination_no VARCHAR(100),
    factory_no VARCHAR(50),
    engine_no VARCHAR(50),
    delivery_party VARCHAR(100),
    order_number VARCHAR(50),
    order_line_number VARCHAR(50),
    price DECIMAL(10,2),
    waybill_no VARCHAR(50),
    shipment_diff VARCHAR(20),
    transport_mode VARCHAR(20),
    carrier_unit VARCHAR(100),
    carrier VARCHAR(50),
    carrier_phone_no VARCHAR(20),
    carrier_vehicle_no VARCHAR(20),
    arrival_date DATE,
    acceptance_date DATE,
    accepted_by VARCHAR(50),
    delivery_no VARCHAR(50),
    erp_status VARCHAR(20),
    acceptance_type VARCHAR(20),
    abnormal_cause VARCHAR(200),
    process_type VARCHAR(20),
    dealer_name VARCHAR(100),
    dealer_code VARCHAR(50),
    channel VARCHAR(50),
    sale_org_code VARCHAR(50),
    area_code VARCHAR(50),
    marketing_code VARCHAR(50),
    INDEX idx_acceptance_date (acceptance_date),
    INDEX idx_rs_product_code (rs_product_code)
) ENGINE=InnoDB;

-- Create index for rs_prd_vcl_base

CREATE INDEX idx_code ON rs_prd_vcl_base(code);

2.2 Test Data Generation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Generate 10,000 vehicle base records

INSERT INTO rs_prd_vcl_base (code, product_model_no, product_model_name)

WITH RECURSIVE seq AS (
    SELECT 1 AS n UNION ALL 
    SELECT n+1 FROM seq WHERE n < 10000
)

SELECT
    CONCAT('CODE', LPAD(n, 6, '0')) AS code,
    CONCAT('MODEL', FLOOR(n/100)) AS product_model_no,
    CONCAT('Model ', FLOOR(n/100), ' Number') AS product_model_name
FROM seq;

-- Generate 10,000 vehicle acceptance records

INSERT INTO rs_vehicle_check (
    acceptance_date, rs_product_code, config_combination_no, 
    acceptance_status, product, factory_no
)

WITH RECURSIVE seq AS (
    SELECT 1 AS n UNION ALL 
    SELECT n+1 FROM seq WHERE n < 10000
)

SELECT
    DATE_ADD('2025-05-01', INTERVAL FLOOR(RAND(n)*60) DAY) AS acceptance_date,
    IF(n%2=0, CONCAT('CODE', LPAD(FLOOR(RAND(n2)10000)+1, 6, '0')), NULL) AS rs_product_code,
    IF(n%2=1, CONCAT('CODE', LPAD(FLOOR(RAND(n3)10000)+1, 6, '0')), NULL) AS config_combination_no,
    'Accepted' AS acceptance_status,
    CONCAT('Product', FLOOR(n/100)) AS product,
    CONCAT('FN', n) AS factory_no
FROM seq;

III. SQL Optimization

3.1 Original SQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
SELECT
    rvc.approval_status, rvc.approval_node,
    rvc.acceptance_status,
    rvc.shipment_no,
    rvc.shipment_date,
    rvc.product as product,
    rpvb.product_model_no as machine_model,
    rpvb.product_model_name,
    rvc.rs_product_code,
    rvc.config_combination_no,
    rvc.factory_no,
    rvc.engine_no,
    rvc.delivery_party,
    rvc.order_number,
    rvc.order_line_number,
    rvc.price,
    rvc.waybill_no,
    rvc.shipment_diff,
    rvc.transport_mode,
    rvc.carrier_unit,
    rvc.carrier,
    rvc.carrier_phone_no,
    rvc.carrier_vehicle_no,
    rvc.arrival_date,
    rvc.acceptance_date,
    rvc.accepted_by,
    rvc.delivery_no,
    rvc.erp_status,
    rvc.acceptance_type,
    rvc.abnormal_cause,
    rvc.process_type,
    rvc.dealer_name,
    rvc.dealer_code,
    rvc.channel,
    rvc.sale_org_code,
    rvc.area_code as region,
    rvc.marketing_code as market_dept,
    rvc.*
FROM rs_vehicle_check rvc
LEFT JOIN rs_prd_vcl_base rpvb
    ON (rvc.rs_product_code = rpvb.code OR rvc.config_combination_no = rpvb.code)
WHERE rvc.acceptance_date >= '2025-05-01'
  AND '2025-06-30' >= rvc.acceptance_date;

3.2 Optimized SQL

We attempted to rewrite using SQLFlash.

SQLFlash Demo

The rewritten SQL is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
SELECT
    rvc.approval_status, rvc.approval_node,
    rvc.acceptance_status,
    rvc.shipment_no,
    rvc.shipment_date,
    rvc.product as product,
    rpvb.product_model_no as machine_model,
    rpvb.product_model_name,
    rvc.rs_product_code,
    rvc.config_combination_no,
    rvc.factory_no,
    rvc.engine_no,
    rvc.delivery_party,
    rvc.order_number,
    rvc.order_line_number,
    rvc.price,
    rvc.waybill_no,
    rvc.shipment_diff,
    rvc.transport_mode,
    rvc.carrier_unit,
    rvc.carrier,
    rvc.carrier_phone_no,
    rvc.carrier_vehicle_no,
    rvc.arrival_date,
    rvc.acceptance_date,
    rvc.accepted_by,
    rvc.delivery_no,
    rvc.erp_status,
    rvc.acceptance_type,
    rvc.abnormal_cause,
    rvc.process_type,
    rvc.dealer_name,
    rvc.dealer_code,
    rvc.channel,
    rvc.sale_org_code,
    rvc.area_code as region,
    rvc.marketing_code as market_dept,
    rvc.*
FROM rs_vehicle_check rvc
LEFT JOIN rs_prd_vcl_base rpvb
    ON rvc.rs_product_code = rpvb.code
WHERE rvc.acceptance_date >= '2025-05-01'
  AND '2025-06-30' >= rvc.acceptance_date

UNION

SELECT
    rvc.approval_status, rvc.approval_node,
    rvc.acceptance_status,
    rvc.shipment_no,
    rvc.shipment_date,
    rvc.product as product,
    rpvb.product_model_no as machine_model,
    rpvb.product_model_name,
    rvc.rs_product_code,
    rvc.config_combination_no,
    rvc.factory_no,
    rvc.engine_no,
    rvc.delivery_party,
    rvc.order_number,
    rvc.order_line_number,
    rvc.price,
    rvc.waybill_no,
    rvc.shipment_diff,
    rvc.transport_mode,
    rvc.carrier_unit,
    rvc.carrier,
    rvc.carrier_phone_no,
    rvc.carrier_vehicle_no,
    rvc.arrival_date,
    rvc.acceptance_date,
    rvc.accepted_by,
    rvc.delivery_no,
    rvc.erp_status,
    rvc.acceptance_type,
    rvc.abnormal_cause,
    rvc.process_type,
    rvc.dealer_name,
    rvc.dealer_code,
    rvc.channel,
    rvc.sale_org_code,
    rvc.area_code as region,
    rvc.marketing_code as market_dept,
    rvc.*
FROM rs_vehicle_check rvc
LEFT JOIN rs_prd_vcl_base rpvb
    ON rvc.config_combination_no = rpvb.code
WHERE rvc.acceptance_date >= '2025-05-01'
  AND '2025-06-30' >= rvc.acceptance_date
  AND (rvc.rs_product_code IS NULL
       OR NOT EXISTS (
           SELECT 1
           FROM rs_prd_vcl_base rpvb2
           WHERE rvc.rs_product_code = rpvb2.code
       ));

View detailed report:https://sqlflash.ai/app/sqlResult?shareid=1707274c-377c-4c44-afd1-49a6062dee8b

IV. Performance Analysis

SQLFlash Analysis

According to the analysis by SQLFlash, the OR condition (rvc.rs_product_code=rpvb.code OR rvc.config_combination_no=rpvb.code) in the original SQL caused the MySQL optimizer to be unable to effectively use indexes, adopting the “Range checked for each record” strategy. Each record had to be evaluated for index usage, resulting in extensive full table scans during actual execution. After rewriting, the OR condition is split into two independent queries using UNION, allowing each query to stably use the idx_code index for ref access. The first query handles cases where rs_product_code matches, and the second query handles cases where config_combination_no matches, with NOT EXISTS ensuring no duplicate records are produced. The query cost dropped from 4.85 million to just over 5,000, a reduction of 99.88%, demonstrating a significant performance improvement.

Original Query Plan

1
2
3
4
5
6
7
8
mysql> EXPLAIN SELECT ... FROM rs_vehicle_check rvc LEFT JOIN rs_prd_vcl_base rpvb 
       ON (rvc.rs_product_code = rpvb.code OR rvc.config_combination_no = rpvb.code) ...;
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                  |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------+
|  1 | SIMPLE      | rvc   | ALL  | idx_accept... | NULL | NULL    | NULL | 9790  | 50.00    | Using where                            |
|  1 | SIMPLE      | rpvb  | ALL  | idx_code      | NULL | NULL    | NULL | 9910  | 100.00   | Range checked for each record (index...|
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------+
  • Query Cost: 4,857,670.32
  • Estimated Rows Processed: 48,509,450 (Cartesian product)
  • Execution Time: Average 19.6223 seconds

Optimized Query Plan

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
mysql> EXPLAIN SELECT ... UNION SELECT ...;
+----+-------------+----------+------+---------------+----------+---------+---------------------------+------+----------+------------------+
| id | select_type | table    | type | possible_keys | key      | key_len | ref                       | rows | filtered | Extra            |
+----+-------------+----------+------+---------------+----------+---------+---------------------------+------+----------+------------------+
|  1 | PRIMARY     | rvc      | ALL  | idx_accept... | NULL     | NULL    | NULL                      | 9803 | 49.99    | Using where      |
|  1 | PRIMARY     | rpvb     | ref  | idx_code      | idx_code | 403     | rvc.rs_product_code       | 1    | 100.00   |                  |
|  2 | UNION       | rvc      | ALL  | idx_accept... | NULL     | NULL    | NULL                      | 9803 | 49.99    | Using where      |
|  2 | UNION       | rpvb     | ref  | idx_code      | idx_code | 403     | rvc.config_combination_no | 1    | 100.00   |                  |
|  3 | DEPENDENT   | rpvb2    | ref  | idx_code      | idx_code | 403     | rvc.rs_product_code       | 1    | 100.00   | Using index      |
|  4 | UNION RES.. | <union>  | ALL  |               |          |         |                           |      |          | Using temporary  |
+----+-------------+----------+------+---------------+----------+---------+---------------------------+------+----------+------------------+
  • Query Cost: 5,592.74 (2,796.37 × 2)
  • Estimated Rows Processed: Approximately 4,982 rows per query
  • Execution Time: Average 0.7804 seconds

Performance Metrics Comparison

ItemOriginal QueryOptimized Query
SQL StructureLEFT JOIN + OR conditionUNION + Independent JOIN
Execution Time19.6223s0.7804s
Performance Improvement-96.02%
Time Saved-18.84s
Query Cost4,857,670.325,592.74
Cost Reduction-99.88%
rpvb Access TypeALL (Full Table Scan)ref (Index Seek)
Index UsageRange checkedStable use of idx_code
Estimated Rows Processed48,509,450~10,000
Extra InformationRange checked for each recordUsing index (subquery)
Temporary TablePossibleUNION temp table (Controllable)

Optimization Principle Analysis

The root cause of the performance issue is: When OR appears in the ON clause of a JOIN, MySQL cannot use multiple indexes simultaneously, ultimately degenerating into:

  • Dynamic index evaluation for each row (Range checked for each record)
  • Massive intermediate result inflation
  • Sharp increase in CPU and I/O costs
  • Exponential growth in execution time

By splitting the OR into two subqueries and merging them with UNION:

  • MySQL can choose the optimal index for each subquery (stable ref access)
  • The scale of intermediate results drops from tens of millions of matches to the thousands level
  • Temporary tables and JOIN Buffers are significantly reduced
  • Performance is typically improved by an order of magnitude or more

V. Summary

In join queries, if the ON clause of a LEFT JOIN contains multiple OR conditions, and each condition corresponds to a different index, priority should be given to splitting the query into multiple independent statements and merging them with UNION. This approach ensures that each subquery stably uses indexes, significantly reducing scan volume and execution cost. When developing, just remember: If OR appears in a JOIN, it’s highly likely that UNION is needed to rescue performance.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.