Together AI Open-Sources Open Deep Research v2 with Dataset, Code, and a Multi-Step Research Workflow

Original: Introducing v2 of our Open Deep Research app! Generate detailed reports on any topic with open source LLMs. Fully free & open source. We're releasing everything: evaluation dataset, code, app, and blog 🔥 View original →

Read in other languages: 한국어日本語
LLM Mar 14, 2026 By Insights AI 2 min read 3 views Source

On March 13, 2026, Together AI said on X that v2 of its Open Deep Research app is now fully free and open source. The company said it is releasing the evaluation dataset, code, app, and blog together with the update. That matters because deep research has quickly become one of the most visible agent workflows in AI: instead of returning a short answer, the system plans a task, searches the web, evaluates evidence, and then produces a longer report with citations.

The companion Open Deep Research blog post explains the mechanics. Together describes a workflow built around planning and self-reflection. The system starts by generating search queries, collects web results, checks for knowledge gaps, and iterates until it has enough material to write a report. The company frames this as a response to multi-hop questions where a single search step is not enough and where users need synthesis rather than a list of links.

What ships in v2

  • The public app announced on X.
  • An evaluation dataset at Hugging Face.
  • The open-source codebase at GitHub.
  • A technical write-up describing architecture, benchmarks, and limitations.

Together also makes clear that this is not a single-model demo. In the blog, different models are assigned to planning, summarization, JSON extraction, and final report writing. The company says this role-based design is intended to balance quality, latency, and cost. It also describes caching to reduce repeated search expense during evaluation, and says a typical reply takes 2 to 5 minutes without podcast generation. That is a useful reminder that high-quality research agents are still materially slower than ordinary chat completions.

For developers, the more durable signal is openness. Together is not only publishing a polished demo but also the pieces needed to benchmark, fork, and extend it. That gives teams a reference implementation for multi-step web research, source ranking, and long-form report generation with citations. The company also openly lists limitations around hallucinations, search bias, and freshness, which makes the release more credible than a simple launch post.

The result is less a model announcement than an attempt to establish an open baseline for research agents. If the community adopts the code and dataset, Open Deep Research v2 could become a practical benchmark for comparing planning loops, retrieval strategies, and report quality across open LLM stacks.

Share: Long

Related Articles

LLM Reddit 3d ago 2 min read

A widely shared r/LocalLLaMA post from a former Manus backend lead argues that a single run(command="...") interface often beats a catalog of typed function calls for agents. The post ties Unix text streams to token-based model interfaces, then backs the claim with design patterns around piping, progressive help, stderr visibility, and overflow handling.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.