OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B
Original: OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B View original →
Free inference becomes a distribution lever
OpenRouter is using free inference capacity to pull more developers toward open-weight models. In a June 15 tweet, the company wrote: “New Free capacity on OpenRouter,” naming gpt-oss-20b and Gemma 4 26B as the two models now served by EigenLabs’ Darkbloom.
The model details make the post more than a routine catalog update. OpenRouter’s page describes gpt-oss-20b as an Apache 2.0 open-weight model with 21B total parameters and a Mixture-of-Experts design that activates 3.6B parameters per forward pass. It lists a 131K context window and support for function calling, tool use, structured outputs, fine-tuning, and configurable reasoning levels.
Gemma 4 26B A4B fills a different slot. OpenRouter describes it as an instruction-tuned MoE model from Google DeepMind with 25.2B total parameters and 3.8B active per token. The page lists a 256K-token context window and multimodal input, including text, images, and video up to 60 seconds at 1 frame per second. Both models being available through a free tier gives teams a way to test routing, latency, and task fit before committing spend.
OpenRouter’s usual role is to sit between application developers and model providers. Its platform lets users route to different hosts of the same model, choosing balanced, fast, or fixed-provider behavior. By naming Darkbloom in the tweet, OpenRouter also highlights the supply side of inference: capacity providers can compete for developer traffic without every user managing deployment infrastructure.
The practical question is durability. Free capacity can be throttled, uneven, or temporary, and real workloads depend on time to first token, rate limits, and uptime as much as headline model specs. Watch whether this capacity remains stable enough for prototyping agents and long-context workflows, or whether it mainly functions as a discovery channel for paid routing later.
Related Articles
Google DeepMind released DiffusionGemma, a 26B MoE open model that uses text diffusion instead of token-by-token decoding. The pitch is up to 4x faster generation on dedicated GPUs for local, interactive workflows.
OpenRouter says Fusion reached within 1% of Claude Fable 5 on 100 DRACO deep-research tasks while costing roughly half as much. The product shifts the contest from one frontier model to a server-side panel, judge, and synthesizer workflow.
The money is following the layer that decides which model gets each request. OpenRouter says weekly traffic rose 5x in six months to 25 trillion tokens, while its platform now spans 400+ models and more than 8 million users.