Releases an offline search-and-browse pipeline with 97K long-horizon trajectories for training 'Deep Research' agents.
March 24, 2026
Original Paper
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
arXiv · 2603.20278
The Takeaway
Democratizes the development of agents like OpenAI's 'Operator' by providing a fully instrumented, reproducible environment and a massive dataset of multi-turn reasoning and tool-use trajectories.
From the abstract
Training deep research agents requires long-horizon trajectories that interleave search, evidence aggregation, and multi-step reasoning. However, existing data collection pipelines typically rely on proprietary web APIs, making large-scale trajectory synthesis costly, unstable, and difficult to reproduce. We present OpenResearcher, a reproducible pipeline that decouples one-time corpus bootstrapping from multi-turn trajectory synthesis and executes the search-and-browse loop entirely offline usi