| | --- |
| | base_model: Qwen/Qwen3-14B |
| | library_name: transformers |
| | tags: |
| | - generated_from_trainer |
| | - open-r1 |
| | - Text2SQL |
| | - Reasoning |
| | licence: apache-2.0 |
| | language: |
| | - en |
| | --- |
| | |
| | # Model Information |
| |
|
| | This model is the reasoning model for the Text-to-SQL task introduced in [Think2SQL: Blueprinting Reward Density and Advantage Scaling for Effective Text-to-SQL Reasoning]() |
| |
|
| |
|
| | This model is a fine-tuned version of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) with thinking disabled on the [BIRD](https://bird-bench.github.io/) dataset. |
| | It has been trained using [TRL](https://github.com/huggingface/trl). |
| |
|
| |
|
| |
|
| | ## Quick start |
| |
|
| | The best model performance is given with its System and User prompts. |
| | The model is intended to be used with three inputs: question, evidence, and the database schema. |
| |
|
| |
|
| | Required `transformers > 4.51.0` to have Qwen3. Make sure to update your transformers installation via `pip install --upgrade transformers`. |
| |
|
| | ```python |
| | import transformers |
| | import torch |
| | model_id = "anonymous-2321/Think2SQL-14B" |
| | pipeline = transformers.pipeline( |
| | "text-generation", |
| | model=model_id, |
| | model_kwargs={"torch_dtype": torch.bfloat16}, |
| | device_map="auto", |
| | ) |
| | |
| | system_message =""" |
| | You are a data science expert that provides well-reasoned and detailed responses. Your task is to understand the schema and generate a valid SQL query to answer the question. |
| | You first think about the reasoning process as an internal monologue and then provide the user with the answer. |
| | Respond in the following format: |
| | <reasoning> |
| | ... |
| | </reasoning> |
| | <answer> |
| | ... |
| | </answer> |
| | """.strip() |
| | |
| | user_message = """ |
| | Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. |
| | Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question. |
| | |
| | Database Engine: |
| | SQLite |
| | |
| | Question: |
| | Return the product name, sorted alphabetically and by price in descending order. |
| | |
| | |
| | Evidence: |
| | |
| | |
| | Database Schema: |
| | CREATE TABLE products ( |
| | id INTEGER PRIMARY KEY, |
| | name TEXT NOT NULL, |
| | price REAL NOT NULL |
| | ); |
| | |
| | CREATE TABLE customers ( |
| | id INTEGER PRIMARY KEY, |
| | name TEXT NOT NULL, |
| | email TEXT NOT NULL |
| | ); |
| | """ |
| | |
| | |
| | messages = [ |
| | {"role": "system", "content": system_message}, |
| | {"role": "user", "content": user_message}, |
| | ] |
| | |
| | outputs = pipeline( |
| | messages, |
| | max_new_tokens=4096, |
| | temperature=0.6, |
| | top_p=0.95, |
| | top_k=20 |
| | ) |
| | print(outputs[0]["generated_text"][-1]) |
| | ``` |
| |
|
| | ## 📖 Overview |
| | Think2SQL is a systematic study on injecting reasoning capabilities into Text-to-SQL through Reinforcement Learning with Verifiable Rewards (RLVR). We uncover the critical interplay between reward density, advantage scaling, and model capacity, proposing novel execution-guided dense rewards and optimal scaling strategies. Our 4B-parameter model achieves reasoning capabilities competitive with state-of-the-art models, while providing a comprehensive analysis for optimizing Text-to-SQL reasoning under computational constraints. |
| |
|
| | **Key Contributions:** |
| | - Execution-guided dense reward function that outperforms binary signals |
| | - Analysis of advantage scaling mechanics for models of different sizes |
| | - Evaluation of cold start effects and supervised fine-tuning impact |
| | - Pareto frontier mapping for training efficiency optimization |