2026's First AI SQL Dataset: Ending the 'SELECT Only' Era

CORGI shatters NL2SQL’s “SELECT-only” trap—when consultants ask “why Southeast Asia sales stalled?”, models must navigate 7.48 JOINs (not just write SQL). DBASQL kills SQL syntax for DBAs: “add a column” → auto-guesses VARCHAR. No more memorizing; just say it. Seriously, SQL’s dead. (Ask me how I know.)

CORGI

CORGI is a specialized, high-difficulty Text-to-SQL benchmark crafted for the consulting domain. Unlike generic SQL evaluations, this test recreates the real-world scenarios faced by top consulting firms like McKinsey and Bain during client meetings and report drafting - spanning industries from consumer platforms and retail to digital services across 10 verticals. It challenges models not just to generate syntactically correct SQL queries, but to truly master complex database schemas with nested relationships. The key capability being tested? The model’s ability to untangle deep business logic - like identifying why a brand’s market penetration stalled in Southeast Asia - from convoluted datasets mirroring enterprise-level data warehouses.

Paper URL: https://arxiv.org/html/2510.07309
Dataset URL: https://github.com/corgibenchmark/CORGI

Paper Intent

This paper tackles three core pain points in existing Text-to-SQL benchmarks (e.g., Spider, BIRD):

Missing Business Logic: Current datasets obsess over basic SQL syntax while ignoring models’ ability to handle real-world business decisions like CAGR calculations and margin analysis. It’s like testing a chef’s knife skills but ignoring whether they can actually cook a meal.
Long SQL Evaluation Flaws: Traditional execution matching fails to distinguish “correct result by luck” from “logically sound query” when dealing with complex, multi-layered SQL (often 1,000+ characters). For example, a model might return the right answer without understanding the underlying business logic.
Unclear Reasoning Paradigms: Testing whether different expert approaches—like McKinsey’s MECE structured decomposition vs. Bain’s hypothesis-driven validation—actually boost model performance on messy business problems.

Dataset Analysis

CORGI was built with serious industry rigor:

Construction: Started with 80 industry-simulated relational databases (e.g., retail inventory, digital service conversion rates), designed using deep domain knowledge. Domain experts then reverse-engineered business questions requiring multi-hop reasoning (e.g., “Why is Brand X underperforming in Southeast Asia?”) using frameworks like Porter’s Five Forces and BCG Matrix. Finally, 1,174 high-quality SQL pairs were manually written and cross-validated.
Data Characteristics: Engineering-heavy by design. CORGI’s structure is brutal: average 7.48 JOINs per query (BIRD: 0.93), average SQL length >1,100 characters. This forces models to master precise navigation and logical synthesis through highly tangled table relationships.

Summary

CORGI elevates NL2SQL evaluation from “semantic translation” to “business analysis.” Experiments show even top models like GPT-4o hit ~50% accuracy on advanced BI tasks involving 7+ JOINs, with performance plummeting as complexity grows. This proves generic model capabilities hit a wall in vertical domains—the future hinges on deep integration of domain expertise and multi-agent collaboration frameworks.

DBASQL

Honestly, current NL2SQL tools are a joke—they handle SELECT queries like pros but completely ignore DBA stuff like creating tables or managing permissions. The authors fixed this by fine-tuning T5-Large to spit out SQL from natural language: “create an employee table” or “grant manager insert access.” They built DBASQL themselves—2,500+ NL-SQL pairs covering DDL (create/alter tables), DML (data changes), and DCL (permissions). Even handles messy requests like “add a new column” (forcing the model to guess VARCHAR instead of INT). Result? DBAs and non-tech folks can now manage databases without knowing SQL syntax. No more memorizing syntax. Seriously, game-changer.

Paper URL: https://www.scitepress.org/Papers/2025/136087/136087.pdf
Dataset URL: https://www.kaggle.com/datasets/pradnyasawant/dbasql

What the paper really says

Current NL2SQL tools are useless for DBAs—they handle SELECT queries like pros but ignoreactual work like table schema changes or permission management. Authors fixed this by fine-tuning T5-Large to spit out SQL from natural language: “create employee table” or “grant manager insert access.” Built DBASQL themselves (2,500 NL-SQL pairs) covering all DBA tasks. Even handles messy requests like “add a new column” (guesses VARCHAR instead of INT—ask me how I know).

DBASQL in real life

Creating tables: “Employee table needs id, name, age” → CREATE TABLE employee (id INT, name VARCHAR, age INT)
Modifying columns: “Change custid from integer to number” → ALTER TABLE users ALTER COLUMN custid TYPE NUMERIC
Permissions: “Let manager insert into user table” → GRANT INSERT ON user_table TO manager

Key trick: Added ambiguous prompts like “add a column” to force the model to infer types—because clientsalways say that.

Why it matters

DBASQL is the first dataset coveringall DBA SQL types (DDL/DML/DCL). Fine-tuned T5-Large:
✅ Exact match accuracy solid
✅ Handles vague requests (no more “what data type?”)
Result? DBAs stop memorizing SQL, non-tech folks can manage databases. SQL syntax is dead. (Seriously, it’s time.)

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.