Skip to content

#inference

RSS Feed
LLM X/Twitter Mar 22, 2026 2 min read

Cloudflare said on March 20, 2026 that Kimi K2.5 was available on Workers AI so developers could build end-to-end agents on Cloudflare’s platform. Its launch post says the model brings a 256k context window, multi-turn tool calling, vision inputs, and structured outputs, while an internal security-review agent processing 7B tokens per day cut costs by 77% after the switch.

LLM Reddit Mar 19, 2026 2 min read

A LocalLLaMA thread on March 18, 2026 pushed fresh attention toward Mamba-3, a new state space model release from researchers at Carnegie Mellon University, Princeton, Cartesia AI, and Together AI. The project shifts its design goal from training speed to inference efficiency and claims prefill+decode latency wins over Mamba-2, Gated DeltaNet, and Llama-3.2-1B at the 1.5B scale.