#gpt-oss

LLM X/Twitter Jun 16, 2026 1 min read

OpenRouter adds free capacity for gpt-oss-20b and Gemma 4 26B

OpenRouter added free capacity for gpt-oss-20b and Gemma 4 26B, served by Darkbloom. The move gives developers a low-cost test path for a 21B open-weight model and a 256K-context multimodal Gemma model.

#openrouter #gpt-oss #gemma

LLM Reddit Apr 8, 2026 2 min read

r/LocalLLaMA Shares a University-Hospital Stack Serving 1B+ Tokens Per Day Locally

A popular r/LocalLLaMA self-post lays out a concrete 2x H200 serving stack for GPT-OSS-120B, including routing, monitoring, and queueing tradeoffs. The appeal is not just the headline throughput, but the unusually detailed operational data behind it.

#localllama #vllm #litellm

LLM Reddit Mar 28, 2026 2 min read

LocalLLaMA Tracks NVIDIA's gpt-oss-puzzle-88B as Puzzle Shrinks gpt-oss-120b for Cheaper Serving

A March 26, 2026 r/LocalLLaMA post linking NVIDIA's `gpt-oss-puzzle-88B` model card reached 284 points and 105 comments at crawl time. NVIDIA says the 88B MoE model uses its Puzzle post-training NAS pipeline to cut parameters and KV-cache costs while keeping reasoning accuracy near or above the parent model.

#nvidia #gpt-oss #open-weights