Together AI brings Wan 2.7 video generation and editing workflows onto one API surface
Original: Introducing Wan 2.7 from @alibaba_cloud on Together AI. AI natives can now build with Wan 2.7 on Together AI and get a clearer path from first-generation video to continuation, reference-driven control, and editing on one production platform. View original →
What Together AI posted on X
On April 3, 2026, Together AI announced that Wan 2.7 from Alibaba Cloud is coming to its platform as a more unified video workflow stack. The framing in the X post is not just about model availability. Together is emphasizing that teams can move from an initial generated clip to continuation, reference-driven control, and editing on one production platform rather than stitching those tasks across separate tools.
That positioning addresses a real problem in multimodal development. Video generation is easy to demo, but difficult to steer once a project needs continuity, reference matching, revision, or editorial control. Teams often end up bouncing between different model providers and post-processing systems. Together’s message is that Wan 2.7 can reduce that fragmentation by bringing more of the workflow into a single operational surface.
What the product post says
Together’s product note describes Wan 2.7 as a four-model suite covering generation, continuation, reference-driven workflows, and editing. Text-to-video is available now through the model Wan-AI/wan2.7-t2v, while image-to-video, reference-to-video, and video edit are planned to roll out next on the same platform.
The currently available text-to-video offering supports 720P and 1080P outputs, 2 to 15 second durations, optional audio input, and prompt-driven multi-shot direction. Together also says the service uses the same APIs, authentication, SDKs, and billing surface that customers already use for the rest of their multimodal stack, with pricing starting at $0.10 per second of generated video.
Why it matters
This is significant because the differentiator in video AI is shifting from “can the model generate a clip?” to “can the platform support production iteration?” Teams need continuation, reference control, editing, and predictable operational interfaces. If those pieces arrive behind one API contract instead of a collection of disconnected services, video becomes easier to integrate into real applications and internal pipelines.
For Together AI, Wan 2.7 is also a positioning move. The company is trying to show that multimodal infrastructure can be unified in the same way text inference was unified: one account, one billing model, one developer surface, multiple capabilities. If that approach works, the platform becomes more valuable not because it hosts one model, but because it reduces the operational cost of using many video workflows together.
Related Articles
At Google I/O 2026, Google DeepMind unveiled Gemini Omni — its first model capable of generating video from any input including text, images, audio, and video. Combining Gemini's intelligence with Google's generative media systems, it is available now through the Gemini app and YouTube Shorts.
NVIDIA Labs released SANA-WM, a 2.6B parameter open-source world model capable of generating up to one minute of 720p video. The relatively small model size and open-source availability make it a significant contribution to accessible video generation research.
A video believed to be from Google's unreleased 'Omni' video generation model has leaked, drawing 1,300+ upvotes on r/singularity. Users particularly noted the model's unusually coherent text rendering - a persistent weakness in current video generation models.