AI & ML Efficiency Breakthrough

Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.

March 26, 2026

Original Paper

Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale

Chinmay Soni, Shivam Chourasia, Gaurav Kumar, Hitesh Kapoor

arXiv · 2603.24023

The Takeaway

By eliminating the need to include massive schema definitions in every prompt, this method enables 8B-parameter models to outperform proprietary models like Gemini Flash 2.0 while drastically reducing API costs and latency. It provides a viable path for deploying high-precision SQL agents in production environments with massive schemas.

From the abstract

Applying large, proprietary API-based language models to text-to-SQL tasks poses a significant industry challenge: reliance on massive, schema-heavy prompts results in prohibitive per-token API costs and high latency, hindering scalable production deployment. We present a specialized, self-hosted 8B-parameter model designed for a conversational bot in CriQ, a sister app to Dream11, India's largest fantasy sports platform with over 250 million users, that answers user queries about cricket statis