FDM-1 Claims a General Computer Action Model Trained on 11M Hours of Video
Original: The First Fully General Computer Action Model View original →
What Was Announced
A Hacker News post titled "The First Fully General Computer Action Model" points to Standard Intelligence's FDM-1 release. The team describes FDM-1 as a foundation model for computer action that operates directly on video and action tokens, rather than relying only on screenshot-centric pipelines. Their claim is broad: one model stack that can handle complex website exploration, multi-step CAD actions, and other long-horizon interaction workflows.
The core pitch is scale plus temporal continuity. Instead of narrow task finetuning from small annotated datasets, the model is presented as pretraining on internet-scale behavioral video.
Training Recipe and Data Strategy
According to the write-up, training is organized into three stages. First, they train an inverse dynamics model (IDM) on roughly 40,000 hours of contractor-labeled screen recordings. Second, they use that IDM to label a much larger corpus, described as 11 million hours of video data. Third, they train the forward dynamics model (FDM) autoregressively on next-action prediction.
The article also highlights a video encoder efficiency claim: nearly two hours of 30 FPS video represented in about one million tokens. The stated objective is to preserve long context while keeping inference practical for continuous control tasks where short-window screenshot methods can break down.
Demos, Evaluation, and Caution
Published demos include CAD manipulation, GUI fuzzing that discovers stateful app bugs, and a self-driving interface scenario where the model reportedly navigates turns after less than one hour of finetuning data. The team further claims evaluation infrastructure capable of over one million rollouts per hour across 80,000 forking VMs, with low-latency control loops.
If these results hold up under independent replication, they suggest the computer-use field is moving from small supervised interaction datasets toward video-native pretraining and infrastructure-heavy evaluation. That shift would change where competitive advantage sits: data pipelines, labeling quality, and systems engineering, not just model architecture.
At the same time, most metrics in this announcement are self-reported. External benchmarking, reproducibility checks, and standardized safety evaluation will be necessary before enterprises treat this class of model as production-ready for high-impact automation workflows.
Sources: Standard Intelligence FDM-1 post, Hacker News discussion
Related Articles
Perplexity says users can now guide Perplexity Computer by voice, not just text. The update turns mid-task feedback and redirection into a spoken control loop for long-running agent work on the web.
OpenAI announced on X that Codex Security has entered research preview. The company positions it as an application security agent that can detect, validate, and patch complex vulnerabilities with more context and less noise.
OpenAI said on X on March 9 that it plans to acquire Promptfoo, an AI security platform, and keep the project open source. The deal strengthens OpenAI Frontier’s agentic testing and evaluation stack.
Comments (0)
No comments yet. Be the first to comment!