AI-Driven SQL Dataset Optimization 202507: BIRD-Critic

BIRD-CRITIC (a.k.a SWE-SQL), the first SQL diagnostic benchmark, is released to answer: Can large language models (LLMs) fix user issues in real-world database applications? The benchmark comprises 600 tasks for development and 200 held-out out-of-distribution (OOD) tests. BIRD-CRITIC 1.0 is built on realistic user issues across 4 prominent open-source SQL dialects: MySQL, PostgreSQL, SQL Server, and Oracle. It expands beyond simple SELECT queries to cover a wider range of SQL operations, reflecting actual application scenarios. Finally, an optimized execution-based evaluation environment is included for rigorous and efficient validation.
Paper: https://arxiv.org/html/2506.18951v2
Leaderboard:https://bird-critic.github.io/
As a new benchmark for SQL debugging, BIRD-CRITIC’s core data consists of three key elements: the problematic SQL statement, the natural language problem description, and the database schema. It primarily evaluates the ability of large models to fix erroneous SQL based on user descriptions. The dataset is divided into three versions:
So, what are the characteristics of this dataset?
All tasks are derived from real user questions on StackOverflow and undergo rigorous screening based on the following criteria:
Example of a real-world case:
Based on the database structures from the BIRD-SQL development set, the team migrated the original SQLite schemas to four widely used production-grade dialects—PostgreSQL, MySQL, SQL Server, and Oracle—using Navicat. After migration, manual verification ensured:
A dual-annotation mechanism was adopted:
A three-stage cross-validation process was implemented:
Key findings from the experiments conducted in the BIRD-Critic paper:
The introduction of this benchmark establishes a new realistic standard for evaluating model capabilities in SQL diagnosis.
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
Join us and experience the power of SQLFlash today!.