GeoSQL-Eval: Finally, a PostGIS Benchmark That Doesn’t Make Me Scream


Finally—a PostGIS test that doesn’t make me want to throw my laptop. GeoSQL-Eval checks if LLMs actually get spatial queries, not just vomit syntactically valid but useless SQL. They dropped GeoSQL-Bench: 14,178 real tasks, 340 PostGIS functions covered, 82 legit spatial DBs (land use, transport networks—you name it).

Let’s be real: old NL2SQL benchmarks skip the messy spatial stuff—geometry types, CRS, PostGIS quirks. So models hallucinate ST_Buffer when they need ST_Distance. GeoSQL-Bench + GeoSQL-Eval fix that. Built with spatial DB folks, not just theorists. Tests if models handle real client queries, not textbook examples.
Tested 24 models. GPT-5/o4-mini crushed geometry-heavy queries. But 70% of errors? Still function misuse. Schema tasks (multi-table joins) = hardest. This isn’t “another benchmark”—it’s the first real test for spatial SQL. Period.
DeKeyNLU fixes the quiet killer in NL2SQL: LLMs failing to break down “Show me Q3 sales in APAC” into actual DB steps. They built a dataset where humans actually verified task splits and keywords—then baked it into DeKeySQL’s pipeline.

RAG/CoT pipelines keep choking on task decomposition and keyword extraction. Existing datasets? Fragmented or missing domain keywords (“fiscal year,” “student cohort”). DeKeyNLU drops a clean fix: a new dataset + DeKeySQL’s 3-module flow—user question understanding → entity retrieval → SQL generation. They fine-tuned only the understanding module… and accuracy jumped hard.
Fine-tuning “user question understanding” with DeKeyNLU pushed BIRD dev accuracy from 62.31% → 69.10%, Spider from 84.2% → 88.7%. Plot twist? Entity retrieval is the make-or-break step (not understanding), followed by question parsing. Proves: targeted dataset design + smart pipeline tweaks > throwing more data at the problem. Finally—NL2SQL that gets what you mean.
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
Join us and experience the power of SQLFlash today!.