Together AI brings Wan 2.7 video generation and editing workflows onto one API surface
Original: Introducing Wan 2.7 from @alibaba_cloud on Together AI. AI natives can now build with Wan 2.7 on Together AI and get a clearer path from first-generation video to continuation, reference-driven control, and editing on one production platform. View original →
What Together AI posted on X
On April 3, 2026, Together AI announced that Wan 2.7 from Alibaba Cloud is coming to its platform as a more unified video workflow stack. The framing in the X post is not just about model availability. Together is emphasizing that teams can move from an initial generated clip to continuation, reference-driven control, and editing on one production platform rather than stitching those tasks across separate tools.
That positioning addresses a real problem in multimodal development. Video generation is easy to demo, but difficult to steer once a project needs continuity, reference matching, revision, or editorial control. Teams often end up bouncing between different model providers and post-processing systems. Together’s message is that Wan 2.7 can reduce that fragmentation by bringing more of the workflow into a single operational surface.
What the product post says
Together’s product note describes Wan 2.7 as a four-model suite covering generation, continuation, reference-driven workflows, and editing. Text-to-video is available now through the model Wan-AI/wan2.7-t2v, while image-to-video, reference-to-video, and video edit are planned to roll out next on the same platform.
The currently available text-to-video offering supports 720P and 1080P outputs, 2 to 15 second durations, optional audio input, and prompt-driven multi-shot direction. Together also says the service uses the same APIs, authentication, SDKs, and billing surface that customers already use for the rest of their multimodal stack, with pricing starting at $0.10 per second of generated video.
Why it matters
This is significant because the differentiator in video AI is shifting from “can the model generate a clip?” to “can the platform support production iteration?” Teams need continuation, reference control, editing, and predictable operational interfaces. If those pieces arrive behind one API contract instead of a collection of disconnected services, video becomes easier to integrate into real applications and internal pipelines.
For Together AI, Wan 2.7 is also a positioning move. The company is trying to show that multimodal infrastructure can be unified in the same way text inference was unified: one account, one billing model, one developer surface, multiple capabilities. If that approach works, the platform becomes more valuable not because it hosts one model, but because it reduces the operational cost of using many video workflows together.
Related Articles
Google expanded Search Live on March 26, 2026 to every language and location where AI Mode is available. The move pushes multimodal voice-and-camera search to more than 200 countries and territories and gives Gemini’s live audio stack a much larger real-world footprint.
Netflix’s VOID reached Reddit as an open research release aimed at removing objects from video and repairing the interactions those objects caused in the scene. The notable details are the CogVideoX base, a two-pass pipeline, Gemini+SAM2 mask generation, and a 40GB+ VRAM requirement.
Google says Cinematic Video Overviews are rolling out to NotebookLM Ultra users in English. The company says the feature combines Gemini 3, Nano Banana Pro, and Veo 3 to generate more immersive videos than the earlier narrated-slide format.
Comments (0)
No comments yet. Be the first to comment!