Anthropic has introduced Natural Language Autoencoders (NLAs), a new interpretability technique that trains Claude to translate its own internal activations into human-readable text—enabling safety audits that can uncover hidden model motivations.
#llm
RSS FeedNVIDIA AI has released Star Elastic, an innovative architecture that packs 30B, 23B, and 12B reasoning models into a single checkpoint, enabling zero-shot slicing to dynamically switch between model scales without separate downloads.
Fields Medalist Timothy Gowers: GPT-5.5 Pro Produced PhD-Level Math Proofs — Research Faces 'Crisis'
Fields Medal-winning mathematician Timothy Gowers tested ChatGPT 5.5 Pro on open math problems and found it produced PhD-level proofs in about an hour, warning that mathematical research faces an imminent 'crisis' at the current rate of AI progress.
A new DELEGATE-52 benchmark study finds that even frontier LLMs like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt an average of 25% of document content during long delegated workflows, with errors compounding silently.
OpenAI replaced GPT-5.3 Instant with GPT-5.5 Instant as ChatGPT's default model for all users including free tier on May 5. The model delivers smarter, more concise answers with improved personalization based on past chats, files, and Gmail integrations.
Anthropic's annualized revenue reached $30B in Q1 2026 — an 80-fold quarterly surge CEO Dario Amodei called 'too hard to handle.' To cope, the company rented SpaceX's entire Colossus data center (220K NVIDIA GPUs) and doubled Claude Code rate limits across all paid tiers.
OpenAI doubled GPT-5.5 listed prices versus GPT-5.4, but OpenRouter analysis shows real user costs rose only 49-92% thanks to the model generating shorter completions.
A well-argued HN post explains why reliable AI agents require deterministic software scaffolding rather than ever-more-elaborate prompting, earning 552 upvotes from developers.
xAI has released Grok 4.3 on its API, claiming top spots on agentic tool calling and instruction-following leaderboards, and ranking #1 in enterprise domains such as case law and corporate finance. It supports a 1M token context window at $1.25/M input and $2.50/M output.
A new arXiv paper (2605.00842) explains why fine-tuning an LLM on a narrow, harmless task can induce broad misalignment in unrelated contexts — attributing it to feature superposition geometry. The work gives a theoretical foundation to one of AI safety's most troubling open problems.
OpenAI replaced ChatGPT's default model with GPT-5.5 Instant on May 5, 2026. The upgrade delivers 52.5% fewer hallucinations and 30% more concise responses than GPT-5.3 Instant, with new memory source controls for personalization transparency.
Security firm Cyera has disclosed "Bleeding Llama," a critical unauthenticated memory leak vulnerability in Ollama that could expose conversation data, API keys, and other sensitive information to remote attackers.