SQL Server String Aggregation: FOR XML PATH vs. STRING_AGG Performance Comparison and Optimization | SQLFlash

In SQL Server, string aggregation is a common requirement for data reporting, logging, and data display. Traditional methods often use FOR XML PATH, but since SQL Server 2017, STRING_AGG has become the new string aggregation function. Although both methods achieve the same functionality, their performance differences can significantly impact system efficiency when handling large-scale data.

This article explores the performance differences between FOR XML PATH and STRING_AGG and provides optimization suggestions to help developers choose the most suitable solution.

1. Common String Aggregation Methods in SQL Server

In SQL Server, strings can typically be aggregated using the following methods:

1.1 FOR XML PATH

This method uses XML aggregation to merge multiple rows into a single string.

1
2
3
SELECT Name
FROM Employees
FOR XML PATH('');

1.2 STRING_AGG

Introduced in SQL Server 2017, this new aggregate function simplifies the string aggregation process and supports specifying delimiters.

1
2
SELECT STRING_AGG(Name, ', ') AS Employees
FROM Employees;

2. Performance Comparison and Testing

2.1 Test Environment

  • Database Version: SQL Server 2019
  • Table Name: order_details (5 million records, including order_id for grouping)

2.2 Performance Testing Methods

2.2.1 Data Preparation

A table with 5 million records was generated, structured as follows:

1
2
3
4
5
CREATE TABLE order_details (
    order_id INT NOT NULL, -- Order number (1-10,000 groups)
    product_name NVARCHAR(50), -- Product name (randomly generated)
    quantity INT -- Purchase quantity (1-10 random values)
);

2.2.2 Test Cases

  1. Small-scale Aggregation: 10 rows per group
  2. Medium-scale Aggregation: 500 rows per group
  3. Large-scale Aggregation: 5,000 rows per group

The goal is to concatenate product_name into a comma-separated string for each group.

2.2.3 Test Statements

1
2
3
4
5
6
7
8
9
-- FOR XML PATH solution
SELECT order_id, STUFF((SELECT ',' + product_name FROM order_details t2 WHERE t1.order_id = t2.order_id FOR XML PATH('')), 1, 1, '') AS products
FROM order_details t1
GROUP BY order_id;

-- STRING_AGG solution (SQL Server 2017+)
SELECT order_id, STRING_AGG(product_name, ',') AS products
FROM order_details
GROUP BY order_id;

2.2.4 Execution Process

Each test case was executed 3 times, with cache cleared before taking the average value. The complete metrics were captured using SET STATISTICS IO, TIME ON.

3. Performance Test Results

Aggregation ScaleFOR XML PATHSTRING_AGGPerformance Improvement
10 rows/group320 ms285 ms12%
500 rows/group1,850 ms1,120 ms40%
5,000 rows/group18,200 ms6,740 ms63%

4. Performance Analysis

4.1 Small-scale Aggregation

For smaller aggregation tasks, both methods perform similarly, with STRING_AGG slightly outperforming FOR XML PATH. However, the performance difference is minimal and unlikely to be a bottleneck.

4.2 Medium-scale Aggregation

As the scale increases, FOR XML PATH’s performance significantly decreases due to the overhead of generating XML and handling escape characters.

4.3 Large-scale Aggregation

In large-scale aggregation scenarios, FOR XML PATH’s performance issues become even more pronounced. It requires handling large data volumes and converting them to XML, which impacts query efficiency.

5. Performance Reasons

5.1 FOR XML PATH Bottlenecks

  • FOR XML PATH relies on the XML engine, requiring data conversion to XML format and additional processing for escape characters.
  • Extra operations such as removing escape characters, sorting, and XML tag handling increase computational overhead as data volume grows.

5.2 STRING_AGG Advantages

  • STRING_AGG, introduced in SQL Server 2017, is specifically designed for string aggregation with optimized underlying processing, reducing extra computation and conversion steps.
  • Since it does not depend on the XML engine, it is more efficient when handling large datasets.

6. Optimization Suggestions and Best Practices

  1. Prefer STRING_AGG: If using SQL Server 2017 or later, STRING_AGG is recommended for its superior performance and cleaner syntax, especially for large datasets.
  2. Avoid FOR XML PATH: If your application does not require compatibility with earlier SQL Server versions (e.g., 2016 or earlier), avoid using FOR XML PATH due to its inferior performance with larger datasets.
  3. Index Optimization: Ensure proper indexing on the grouping fields (e.g., order_id) to improve query execution efficiency and reduce full table scans.
  4. Batch Processing for Large Datasets: Consider batch processing for very large datasets, aggregating data in pages and concatenating results at the application level to avoid querying excessive data at once.

7. SQLFLASH Optimization Demo

In complex queries, outdated syntax like FOR XML PATH can still exist and impact performance. Tools like SQLFLASH can automate optimization, especially when converting FOR XML PATH to the more efficient STRING_AGG syntax, significantly reducing execution time.

After SQLFLASH Optimization

Example:

1
2
3
4
5
6
7
8
9
-- Original FOR XML PATH query
SELECT order_id, STUFF((SELECT ',' + product_name FROM order_details t2 WHERE t1.order_id = t2.order_id FOR XML PATH('')), 1, 1, '') AS products
FROM order_details t1
GROUP BY order_id;

-- Optimized with SQLFLASH to STRING_AGG
SELECT order_id, STRING_AGG(product_name, ',') AS products
FROM order_details t1
GROUP BY order_id;

8. Summary

In SQL Server, STRING_AGG is the optimal method for string aggregation, particularly for SQL Server 2017 and later versions. For large-scale data processing, STRING_AGG outperforms FOR XML PATH with higher performance and cleaner syntax. Developers should prioritize STRING_AGG and avoid the outdated FOR XML PATH, while ensuring proper indexing to further optimize performance.

This article compares FOR XML PATH and STRING_AGG in SQL Server, highlighting STRING_AGG’s superior performance for string aggregation tasks.


What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.