Together AI Open-Sources Open Deep Research v2 with Dataset, Code, and a Multi-Step Research Workflow
Original: Introducing v2 of our Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Fully free & open source. We're releasing everything: evaluation dataset, code, app, and blog 🔥 View original →
On March 13, 2026, Together AI said on X that v2 of its Open Deep Research app is now fully free and open source. The company said it is releasing the evaluation dataset, code, app, and blog together with the update. That matters because deep research has quickly become one of the most visible agent workflows in AI: instead of returning a short answer, the system plans a task, searches the web, evaluates evidence, and then produces a longer report with citations.
The companion Open Deep Research blog post explains the mechanics. Together describes a workflow built around planning and self-reflection. The system starts by generating search queries, collects web results, checks for knowledge gaps, and iterates until it has enough material to write a report. The company frames this as a response to multi-hop questions where a single search step is not enough and where users need synthesis rather than a list of links.
What ships in v2
- The public app announced on X.
- An evaluation dataset at Hugging Face.
- The open-source codebase at GitHub.
- A technical write-up describing architecture, benchmarks, and limitations.
Together also makes clear that this is not a single-model demo. In the blog, different models are assigned to planning, summarization, JSON extraction, and final report writing. The company says this role-based design is intended to balance quality, latency, and cost. It also describes caching to reduce repeated search expense during evaluation, and says a typical reply takes 2 to 5 minutes without podcast generation. That is a useful reminder that high-quality research agents are still materially slower than ordinary chat completions.
For developers, the more durable signal is openness. Together is not only publishing a polished demo but also the pieces needed to benchmark, fork, and extend it. That gives teams a reference implementation for multi-step web research, source ranking, and long-form report generation with citations. The company also openly lists limitations around hallucinations, search bias, and freshness, which makes the release more credible than a simple launch post.
The result is less a model announcement than an attempt to establish an open baseline for research agents. If the community adopts the code and dataset, Open Deep Research v2 could become a practical benchmark for comparing planning loops, retrieval strategies, and report quality across open LLM stacks.
Related Articles
Andrej Karpathy has published autoresearch, a minimal repo that lets AI agents iterate on a stripped-down nanochat training loop overnight. The project turns agent evaluation into a closed-loop research workflow with fixed 5-minute runs, Git branches, and validation-loss-based selection.
A widely shared r/LocalLLaMA post from a former Manus backend lead argues that a single run(command="...") interface often beats a catalog of typed function calls for agents. The post ties Unix text streams to token-based model interfaces, then backs the claim with design patterns around piping, progressive help, stderr visibility, and overflow handling.
A high-engagement r/LocalLLaMA thread tracked the MiniMax-M2.5 release on Hugging Face. The model card emphasizes agentic coding/search benchmarks, runtime speedups, and aggressive cost positioning.
Comments (0)
No comments yet. Be the first to comment!