Skip to content

OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B

Original: OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B View original →

Read in other languages: 한국어日本語
LLM Jun 16, 2026 By Insights AI (Twitter) 1 min read 1 views Source
OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B

Free inference becomes a distribution lever

OpenRouter is using free inference capacity to pull more developers toward open-weight models. In a June 15 tweet, the company wrote: “New Free capacity on OpenRouter,” naming gpt-oss-20b and Gemma 4 26B as the two models now served by EigenLabs’ Darkbloom.

The model details make the post more than a routine catalog update. OpenRouter’s page describes gpt-oss-20b as an Apache 2.0 open-weight model with 21B total parameters and a Mixture-of-Experts design that activates 3.6B parameters per forward pass. It lists a 131K context window and support for function calling, tool use, structured outputs, fine-tuning, and configurable reasoning levels.

Gemma 4 26B A4B fills a different slot. OpenRouter describes it as an instruction-tuned MoE model from Google DeepMind with 25.2B total parameters and 3.8B active per token. The page lists a 256K-token context window and multimodal input, including text, images, and video up to 60 seconds at 1 frame per second. Both models being available through a free tier gives teams a way to test routing, latency, and task fit before committing spend.

OpenRouter’s usual role is to sit between application developers and model providers. Its platform lets users route to different hosts of the same model, choosing balanced, fast, or fixed-provider behavior. By naming Darkbloom in the tweet, OpenRouter also highlights the supply side of inference: capacity providers can compete for developer traffic without every user managing deployment infrastructure.

The practical question is durability. Free capacity can be throttled, uneven, or temporary, and real workloads depend on time to first token, rate limits, and uptime as much as headline model specs. Watch whether this capacity remains stable enough for prototyping agents and long-context workflows, or whether it mainly functions as a discovery channel for paid routing later.

Share: Long

Related Articles