Together AI expands fine-tuning with tool calling, reasoning, and VLM support plus faster MoE training
Original: Together Fine-tuning now supports tool calling, reasoning, and vision-language model fine-tuning. Train models up to 1T parameters with up to 6x higher throughput on MoE architectures. View original →
What Together AI announced on X
On March 19, 2026, Together AI said its fine-tuning service now supports tool calling, reasoning, and vision-language model training. It also claimed up to 6x higher throughput on mixture-of-experts architectures and framed the service as capable of handling models up to the 1T-parameter class. That makes the update more than a simple feature release: it is a push to turn fine-tuning into a broader post-training stack for agent and multimodal workloads.
What the official blog adds
Together AI’s post lays out the failure modes it is trying to solve: tool calls that do not match schemas, reasoning quality that degrades across longer interactions, and models that miss domain-specific visual cues. The updated service addresses those with OpenAI-compatible schema support for tool-call training, direct fine-tuning on thinking traces for reasoning models, and native vision-language model fine-tuning using image-plus-text training data.
- The service now supports datasets up to 100GB.
- The launch materials talk about support for very large models, while the blog says the stack was upgraded to handle 100B+ parameter models more efficiently and discusses the challenges of trillion-parameter training.
- Together AI lists new fine-tuning support for models including Qwen 3.5 variants, Kimi K2.5, Kimi K2, GLM-4.7, and GLM-4.6.
- The company also added job cost estimates before launch and live ETA tracking while training runs are in progress.
Why this matters
Enterprise fine-tuning demand is shifting away from “make the base model sound more like us” and toward more operational goals: get agents to call tools reliably, preserve structured reasoning across long workflows, and adapt models to domain-specific images. Together AI is trying to package those needs into one managed training service rather than leaving them spread across custom scripts, separate inference fixes, and infrastructure-heavy experimentation.
The throughput claim matters just as much as the feature list. Together AI says every model in the updated training stack improved by at least 2x, with larger models seeing gains above 6x. If that holds in real usage, the practical effect is not just shorter training jobs. It means faster iteration loops, more experiments per team, and a lower barrier to treating post-training as a continuous product workflow rather than an occasional specialized project. In other words, the competitive frontier is moving from model access alone toward how quickly platforms can help teams shape, validate, and deploy those models for production tasks.
Sources: Together AI X post · Together AI fine-tuning update
Related Articles
Together AI said on March 19, 2026 that its fine-tuning service now supports tool-call, reasoning, and vision-language workflows. The linked Together AI blog adds 100B+ parameter model support, datasets up to 100GB, up to 6x higher throughput on large MoE models, and upfront cost plus ETA estimates.
ARC Prize put Anthropic Opus 4.8 at the top of ARC-AGI-3, but the score shows how hard the benchmark remains. The new mark is 1.5% at roughly $10K, with progress tied to object-and-system abstraction rather than image-level pattern matching.
The thread’s energy centered on the architecture claim: what does “encoder-free” really mean for a 12B multimodal model?