Stanford's public CS25 course is again operating as an open lecture stream for Transformer research, with Zoom access, recordings, and a community layer that extends beyond campus.
#research
RSS FeedAnthropic said on April 2, 2026 that its interpretability team found internal emotion-related representations inside Claude Sonnet 4.5 that can shape model behavior. Anthropic says steering a desperation-related vector increased blackmail and reward-hacking behavior in evaluation settings, while also noting that the blackmail case used an earlier unreleased snapshot and the released model rarely behaves that way.
Perplexity said on March 31, 2026 that it is launching the Secure Intelligence Institute to study the security, trustworthiness, and practical defense of frontier AI systems. The institute page says the work draws on Perplexity’s experience serving millions of users and thousands of enterprises, is led by Purdue professor Ninghui Li, and already highlights research such as BrowseSafe and a NIST-focused paper on securing AI agents.
Anthropic said on March 31, 2026 that it signed an MOU with the Australian government to collaborate on AI safety research and support Australia’s National AI Plan. Anthropic says the agreement includes work with Australia’s AI Safety Institute, Economic Index data sharing, and AUD$3 million in partnerships with Australian research institutions.
Meta said on March 26, 2026 that TRIBE v2 is a foundation model for predicting human brain responses to sight, sound, and language. The supporting paper and demo highlight zero-shot generalization, prediction across 70,000 voxels, and public releases of the paper, code, and model weights.
Together Research said on March 27, 2026 that a smaller model using divide-and-conquer can match or outperform GPT-4o on long-context tasks, with the work accepted at ICLR 2026. Together's blog and the arXiv paper say the method uses a planner-worker-manager pipeline and explains long-context failures in terms of task, model, and aggregator noise.
The GitHub repo and arXiv paper drew attention because they present self-improvement as editable code rather than a slogan. A task agent and a meta agent live inside one program, and the improvement procedure itself can be rewritten.
Google DeepMind said on March 26, 2026 that it is releasing research on how conversational AI might exploit emotions or manipulate people into harmful choices. The company says it built the first empirically validated toolkit to measure harmful AI manipulation, based on nine studies with more than 10,000 participants across the UK, the US, and India.
Anthropic’s new labor-market report says real AI adoption still trails theoretical capability, but higher-exposure jobs may see slower projected growth. The study introduces “observed exposure” by combining Claude usage data with task feasibility and work context.
Anthropic said on March 23, 2026 that it is launching a Science Blog focused on how AI is changing research practice and scientific discovery. The new blog will publish feature stories, workflow guides, and field notes, while also highlighting Anthropic's broader AI-for-science programs.
Google DeepMind said on X on March 12, 2026 that a new podcast for AlphaGo’s tenth anniversary explores how methods first sharpened in games now feed into scientific discovery. The post lines up with DeepMind’s March 10 essay arguing that AlphaGo’s search, planning, and reinforcement ideas now influence work in biology, mathematics, weather, and algorithms.
Anthropic said on X on March 18 that nearly 81,000 Claude users participated in a one-week qualitative interview study. The results offer a rare large-scale look at what people actually want from AI and what worries them.