#llm

AI sources.twitter 1d ago 2 min read

DeepSeek-V4 opens 1M context with 1.6T/49B and 284B/13B split

Why it matters: open models rarely arrive with both giant context claims and deployable model splits. DeepSeek put hard numbers on the release with a 1M-context design, a 1.6T/49B Pro model, and a 284B/13B Flash variant.

#deepseek #open-weights #llm

LLM Hacker News 1d ago 2 min read

HN Reacts to Browser Harness: Let the Agent Rewrite Its Browser Tools Mid-Task

HN did not push Browser Harness because it was another browser wrapper. It took off because the repo lets an LLM patch its own browser helpers in the middle of a task, trading safety rails for raw flexibility.

#browser-automation #web-agents #cdp

LLM Hacker News 2d ago 2 min read

HN Spots the Real DeepSeek V4 Story: The Docs Link Was Thin, but the Weights Were Already Live

HN did not latch onto DeepSeek V4 because of a polished launch page. The thread took off when commenters realized the front-page link was just updated docs while the weights and base models were already live for inspection.

#deepseek #llm #moe

LLM Hacker News 3d ago 2 min read

HN Fixates on “Over-Editing”: When Coding Models Rewrite More Than the Bug

HN latched onto a pain every heavy coding-tool user knows: the bug is tiny, but the diff balloons anyway. A new write-up turns that annoyance into a measurable benchmark and argues that better prompting and RL can make models edit with more restraint.

#coding-agents #minimal-editing #code-review

LLM Hacker News 3d ago 2 min read

HN Reads GitHub Copilot Plan Changes as the Cost of Agentic Coding Coming Due

Hacker News focused less on the Copilot plan mechanics and more on what the change reveals: long-running coding agents are turning flat AI subscriptions into a compute-cost problem.

#github-copilot #coding-agent #pricing

LLM Hacker News 4d ago 2 min read

Kimi K2.6 turned HN’s model debate toward open-weight coding agents

HN read Kimi K2.6 as a test of whether open-weight coding agents can last through real engineering work. The 12-hour and 13-hour coding cases drew attention, while commenters immediately pressed on speed, provider accuracy, and benchmark realism.

#kimi #coding-agents #open-weights

AI Reddit Apr 19, 2026 2 min read

A DeepMind Scientist’s Anti-LLM-Consciousness Paper Hits Reddit’s Nerve

r/singularity reacted because the post turned LLM consciousness into a fight over computation itself. Alexander Lerchner’s “Abstraction Fallacy” paper argues that computation depends on a mapmaker, while commenters pushed back with questions about definitions, Chinese Room echoes, and philosophy versus neuroscience.

#ai-consciousness #llm #philosophy

LLM Hacker News Apr 19, 2026 2 min read

HN Is Testing Opus 4.7’s Tokenizer, Not Just Complaining About Limits

HN upvoted this because it turned vague limit anxiety into numbers. Tokenomics says 541 anonymous submissions averaged 466 request tokens on Opus 4.7 versus 349 on Opus 4.6, a 38.1% increase, and the thread immediately argued over what that means for real Claude usage.

#llm #anthropic #tokenizer

AI Hacker News Apr 18, 2026 2 min read

HN asked whether AI bug hunting is really just more tokens

HN treated “AI cybersecurity is not proof of work” as a serious argument about search, model capability, and security asymmetry. The thread pushed past hype into a harder question: when an LLM flags a bug, did it understand the exploit path or just sample a suspicious pattern?

#ai-security #cybersecurity #llm

LLM Reddit Apr 18, 2026 2 min read

Opus 4.7’s Reddit benchmark fight was really about refusals versus regression

The r/singularity thread did not just react to Opus 4.7 scoring 41.0% where Opus 4.6 scored 94.7%. The interesting part was the community trying to separate real capability loss from refusal behavior, routing, and benchmark interpretation.

#claude #benchmarks #opus

LLM sources.research Apr 17, 2026 2 min read

LLM judges hide instability: 33-67% of documents break consistency

A new arXiv paper shows why low average violation rates can make LLM judges look safer than they are. On SummEval, 33-67% of documents showed at least one directed 3-cycle, and prediction-set width tracked absolute error strongly.

#llm #evaluation #benchmarks

LLM Hacker News Apr 17, 2026 2 min read

HN Looks Past the Claude Opus 4.7 Headline to Adaptive Thinking, Tokens, and Trust

HN did not just ask whether Claude Opus 4.7 scores higher; it asked whether the product behavior is stable enough to build around. The thread quickly moved into adaptive thinking, tokenizer costs, safety filters, and bruised trust after recent Claude complaints.

#claude #llm #adaptive-thinking