Researchers have demonstrated that transformer models with fewer than 100 parameters can add two 10-digit numbers with 100% accuracy using digit tokenization, challenging assumptions about the minimum complexity needed for arithmetic reasoning.
#llm
RSS FeedInception Labs has released Mercury 2, the first production-ready diffusion language model for reasoning. Running at over 1,000 tokens per second on Blackwell GPUs, it is dramatically faster and cheaper than leading autoregressive competitors.
llmfit is an open-source CLI tool that automatically detects your system's RAM, CPU, and GPU specs to recommend the optimal LLM model and quantization level, dramatically lowering the barrier to running local AI.
A deep-dive into why XML tags work better than other delimiters with Claude — rooted in how Anthropic structured Claude's training data and the model's extensive exposure to XML-structured prompts throughout fine-tuning.
Developer Eric Holmes argues that MCP is already dying, claiming LLMs already excel at using CLI tools without a special protocol. He makes a strong case that CLIs compose better, debug easier, and work with existing auth systems.
The Financial Times reports that DeepSeek V4 is set to launch next week, featuring image and video generation capabilities that position it as a direct competitor to multimodal AI models from OpenAI and Google.
Andrej Karpathy highlights the fundamental memory+compute trade-off challenge in LLMs: fast but small on-chip SRAM versus large but slow off-chip DRAM. He calls optimizing this the most intellectually rewarding puzzle in AI infrastructure today, pointing to NVIDIA's $4.6T market cap as proof.
AI researcher Andrej Karpathy argues that programming has fundamentally changed over the last two months, particularly since December when coding agents started actually working. Developers are shifting from writing code to directing and managing AI agents in parallel.
A Hacker News thread analyzed a benchmark of 2,430 Claude Code runs, focusing on default stack choices, build-vs-buy behavior, and ecosystem lock-in risks.
Google DeepMind announced Gemini 3.1 Pro on February 19, 2026 as an upgraded core model for harder tasks. The company highlighted a verified 77.1% score on ARC-AGI-2 and broad rollout across developer, enterprise, and consumer surfaces.
On February 2, 2026, OpenAI and Snowflake announced an expanded partnership to bring OpenAI models directly into Snowflake Cortex AI. The move targets secure, governed, and lower-friction enterprise deployment of generative AI.
A r/LocalLLaMA post reports a from-scratch 144M-parameter Spiking Neural Network language model experiment named Nord. The author claims 97-98% inference sparsity, STDP-based online updates, and better prompt-level topic retention than GPT-2 Small on limited examples, while clearly noting current loss and benchmark limitations.