OpenAI announced that GPT-5.4 Thinking and GPT-5.4 Pro are rolling out in ChatGPT, while GPT-5.4 is already available in the API and Codex. The launch positions GPT-5.4 as a unified frontier model for reasoning, coding, and agentic workflows.
LLM
RSS FeedA high-ranking Hacker News post highlighted Google Workspace CLI, an open-source tool that unifies Workspace APIs behind one command surface with structured JSON output, dynamic discovery-based commands, and agent-oriented workflows.
In a January 21, 2026 engineering post, Anthropic explained how it repeatedly redesigned a take-home performance test as Claude models improved. The company describes how Opus 4 and Opus 4.5 changed the evaluation baseline and forced process-level updates.
Anthropic posted that Opus 3, after retirement interviews, will continue sharing its reflections via a Substack blog for at least the next three months. The update points to an ongoing public publishing format rather than a one-off model announcement.
Google AI Developers announced that Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API and Google AI Studio. The post positions it as the fastest and most cost-efficient model in the Gemini 3 line, now adding dynamic thinking for task-adaptive reasoning.
A high-ranking Hacker News thread highlighted a two-sided Qwen story: rapid model quality gains and potential organizational instability. As Qwen 3.5 expands across model sizes, reported leadership departures raise questions about roadmap continuity in the open-weight LLM ecosystem.
A high-engagement LocalLLaMA post on March 4, 2026 discussed Microsoft’s open-weight Phi-4-Reasoning-Vision-15B and focused on practical deployment tradeoffs for local multimodal inference.
A March 4, 2026 Hacker News thread elevated Q Labs’ Slowrun benchmark, which fixes training data at 100M FineWeb tokens and optimizes for data efficiency under large compute budgets.
NVIDIA AI Developer says a collaboration with SGLang achieved up to 25x faster DeepSeek R1 inference on GB300 NVL72 versus H200 and an 8x GB200 NVL72 gain within months. The post attributes gains to NVFP4 precision, disaggregation, and communication-compute overlap.
OpenAI Developers posted that the Codex app is now available on Windows with a native agent sandbox and PowerShell-oriented developer environment support. The update extends Codex usage beyond previous desktop workflows and signals deeper Windows integration for agentic coding tasks.
A high-scoring LocalLLaMA post benchmarked Qwen3.5-27B Q4 GGUF variants against BF16, separating “closest-to-baseline” choices from “best efficiency” picks for constrained VRAM setups.
A high-signal Hacker News thread surfaced Unsloth’s Qwen3.5 guide, which maps model sizes to bf16 LoRA VRAM budgets and clarifies MoE, vision, and export paths for production workflows.