AI & ML New Capability

ORACLE uses symbolic reasoning engines to verify intermediate reasoning steps in synthetic data generation, moving beyond simple answer-correctness filtering.

March 24, 2026

Original Paper

ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation

Zhuojie Yang, Wentao Wan, Keze Wang

arXiv · 2603.21140

The Takeaway

It enables the creation of high-quality reasoning datasets for natural language tasks where code execution is impossible. By validating each step of a syllogistic chain, it provides a more reliable signal for fine-tuning LLM reasoning capabilities.

From the abstract

Training large language models (LLMs) with synthetic reasoning data has become a popular approach to enhancing their reasoning capabilities, while a key factor influencing the effectiveness of this paradigm is the quality of the generated multi-step reasoning data. To generate high-quality reasoning data, many recent methods generate synthetic reasoning paths and filter them based on final answer correctness, often overlooking flaws in intermediate reasoning steps. To enhance the verification of