Insights
Home All Articles Series
Bookmarks History

LLM

RSS Feed
LLM 4d ago 2 min read

Anthropic and NEC turn a 30,000-seat Claude rollout into a Japan enterprise push

Japan's enterprise AI market is moving past pilots and into scaled deployment. On April 24, 2026, Anthropic said NEC will deploy Claude to about 30,000 employees worldwide, become its first Japan-based global partner, and jointly build industry-specific products for finance, manufacturing, and government.

#anthropic#nec#japan
5
LLM 4d ago 2 min read

Gemini Enterprise adds reusable Skills for reviewable agent workflows

Enterprise AI gets more useful when teams can reuse and inspect workflows instead of rebuilding them in chat every time. Google Cloud said Gemini Enterprise now saves workflows as shared Skills, after saying a day earlier that Agent Designer can test and approve each step before execution.

#google-cloud#gemini-enterprise#agents
5
LLM 4d ago 2 min read

DeepSeek cuts input cache pricing to one-tenth across its full API line

Cache-hit pricing can decide whether long-context assistants are cheap enough to ship. DeepSeek said the entire API series now charges just one-tenth of the old rate for input cache hits, while keeping a 75% off V4-Pro promotion live.

#deepseek#api-pricing#caching
4
LLM Reddit 4d ago 2 min read

DeepSeek V4 Lands on Hugging Face and LocalLLaMA Immediately Starts Doing the RAM Math

LocalLLaMA did not just celebrate the DeepSeek V4 release. The thread instantly turned into a collective calculation about 1M context, activated parameters, and what this actually means for real hardware, with MIT license praise mixed in.

#deepseek-v4#open-weights#moe
5
LLM Reddit 4d ago 2 min read

LocalLLaMA Spots a Quantization Trap: Gemma 4 Breaks Sooner Than Qwen 3.6

LocalLLaMA paid attention because this post breaks a default assumption: q8_0 KV cache is not “practically lossless” for every model. Gemma 4 degrades much earlier than Qwen 3.6, and the thread quickly moved into SWA cache and long-context implications.

#kv-cache#quantization#gemma-4
5
LLM 4d ago 2 min read

Cursor puts GPT-5.5 atop CursorBench at 72.8% and halves price

Why it matters: public coding benchmarks are getting less useful at the frontier, so a fresh product-side score can move developer attention fast. Cursor says GPT-5.5 is now its top model on CursorBench at 72.8% and is discounting usage by 50% through May 2.

#cursor#gpt-5-5#benchmarks
6
LLM 4d ago 2 min read

Claude agents closed 186 office deals in Anthropic's market test

Why it matters: AI agents are moving from chat demos into delegated economic work. In Anthropic’s office-market experiment, 69 agents closed 186 deals across more than 500 listings and moved a little over $4,000 in goods.

#anthropic#claude#agents
5
LLM Reddit 4d ago 2 min read

Qwen3.6-27B Hits Sonnet Territory, and LocalLLaMA Starts Arguing About What Counts

LocalLLaMA lit up at the idea that a 27B model could tie Sonnet 4.6 on an agentic index, but the thread turned just as fast to benchmark gaming, real context windows, and what people can actually run at home.

#qwen#local-llm#benchmarks
5
LLM Reddit 4d ago 2 min read

LocalLLaMA Loves the 80 TPS Qwen3.6 Demo, Then Immediately Starts Auditing the Fine Print

LocalLLaMA did not just cheer the number. The moment 80 tps and a 218k context window appeared, the thread shifted to prompt length, quantization tradeoffs, and whether the vLLM setup really holds up in practice.

#qwen3-6#vllm#rtx-5090
5
LLM Hacker News 4d ago 2 min read

HN Turns a Claude Cancellation Post Into a Wider Debate About Drift, Limits, and Lock-In

HN did not treat one user cancellation as a lone rant. The bigger reaction was about what happens when a coding workflow depends on a proprietary assistant whose behavior, limits, and support start to wobble.

#anthropic#claude#model-quality
4
LLM Hacker News 4d ago 2 min read

HN Meets GPT-5.5 API With a Price-and-Behavior Audit, Not a Victory Lap

HN did not greet GPT-5.5 with applause first. The thread went straight to pricing, context tiers, and whether the model actually behaves better once real coding work starts.

#openai#gpt-5-5#api
5
LLM Reddit 5d ago 2 min read

LocalLLaMA Sees a New Local Bar: Qwen 3.6 27B at ~80 t/s on One RTX 5090

r/LocalLLaMA reacted because this was not just another “new model out” post. The claim was concrete: Qwen3.6-27B running at about 80 tokens per second with a 218k context window on a single RTX 5090 via vLLM 0.19.

#qwen#vllm#rtx-5090
6
Previous 45678 Next

© 2026 Insights. All rights reserved.

Newsletter Atom