A Reddit post in r/singularity highlighted CUDA Agent, a ByteDance Seed and Tsinghua AIR project that reports high pass rates and speedups over torch.compile on KernelBench.
LLM
A LocalLLaMA thread highlighted ongoing work to add NVFP4 quantization support to llama.cpp GGUF, pointing to potential memory savings and higher throughput for compatible GPU setups.
OpenAI announced GPT-5.4 on March 5, 2026, adding a new general-purpose model and GPT-5.4 Pro with stronger computer use, tool search efficiency, and benchmark improvements over GPT-5.2.
A popular r/MachineLearning discussion examines an unofficial theorem-style claim that Attention’s core optimization geometry is d^2, not n^2. Community response is mixed: strong curiosity, but equally strong calls for peer review and reproducible evidence.
Google expanded Canvas in AI Mode to everyone in the U.S. in English. The update adds stronger support for drafting documents and building interactive tools directly in Search. Google says Canvas can generate side-panel prototypes using fresh web information and the Knowledge Graph.
OpenAIDevs said Codex now supports a /fast mode where GPT-5.4 runs 1.5x faster while keeping the same intelligence and reasoning profile. The update targets faster coding iteration and debugging loops for developer workflows.
OpenAI said it published a new Chain-of-Thought controllability evaluation suite and research paper. The company reports that GPT-5.4 Thinking showed limited ability to obscure its reasoning, supporting chain-of-thought monitoring as a practical safety mechanism.
OpenAI announced that GPT-5.4 Thinking and GPT-5.4 Pro are rolling out in ChatGPT, while GPT-5.4 is already available in the API and Codex. The launch positions GPT-5.4 as a unified frontier model for reasoning, coding, and agentic workflows.
A high-ranking Hacker News post highlighted Google Workspace CLI, an open-source tool that unifies Workspace APIs behind one command surface with structured JSON output, dynamic discovery-based commands, and agent-oriented workflows.
In a January 21, 2026 engineering post, Anthropic explained how it repeatedly redesigned a take-home performance test as Claude models improved. The company describes how Opus 4 and Opus 4.5 changed the evaluation baseline and forced process-level updates.
Anthropic posted that Opus 3, after retirement interviews, will continue sharing its reflections via a Substack blog for at least the next three months. The update points to an ongoing public publishing format rather than a one-off model announcement.
Google AI Developers announced that Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API and Google AI Studio. The post positions it as the fastest and most cost-efficient model in the Gemini 3 line, now adding dynamic thinking for task-adaptive reasoning.