LLM

LLM Reddit Mar 2, 2026 1 min read

How to Run Qwen3.5 27B with 170k Context at 100+ t/s on 2x RTX 3090

A community developer achieved 100+ t/s decode speed and 585 t/s aggregate throughput for 8 simultaneous requests running Qwen3.5 27B on a dual RTX 3090 setup with NVLink, using vLLM with tensor parallelism and MTP optimization.

#qwen #local-inference #vllm

LLM Hacker News Mar 2, 2026 1 min read

llmfit: Auto-Select the Right LLM Model for Your Hardware

llmfit is an open-source CLI tool that automatically detects your system's RAM, CPU, and GPU specs to recommend the optimal LLM model and quantization level, dramatically lowering the barrier to running local AI.

#llm #open-source #hardware-optimization

LLM Mar 2, 2026 1 min read

Claude Hits #1 on US App Store After Trump's Anthropic Ban

Following President Trump's order barring federal agencies from using Anthropic products, Claude surged to the top of the US App Store's free apps chart, with daily signups hitting all-time records and free users growing over 60% since January.

#anthropic #claude #product-launch

LLM Reddit Mar 2, 2026 1 min read

13 Months After the DeepSeek Moment: How Far Has Local AI Come?

A remarkable 13-month comparison: running frontier-level DeepSeek R1 at ~5 tokens/second cost $6,000 in early 2025. Today, you can run a significantly stronger model at the same speed on a $600 mini PC — and get 17-20 t/s with even more capable models.

#local-llm #deepseek #qwen

LLM Reddit Mar 2, 2026 1 min read

Reverse Engineered Apple Neural Engine to Train Microgpt

A developer with a Mac Mini M4 used Claude to reverse engineer Apple's private Neural Engine APIs, bypassed CoreML, and successfully trained a 110M parameter Microgpt model entirely on the ANE — opening new possibilities for NPU-based AI training.

#apple #neural-engine #npu

LLM Reddit Mar 2, 2026 1 min read

Qwen 3.5 Small Released: A New Benchmark for Local AI

Alibaba's Qwen team has released Qwen 3.5 Small, a new small dense model in their flagship open-source series. The announcement topped r/LocalLLaMA with over 1,000 upvotes, reflecting the local AI community's enthusiasm for capable small models.

#qwen #local-llm #open-source

LLM Hacker News Mar 2, 2026 1 min read

Why XML Tags Are So Fundamental to Claude

A deep-dive into why XML tags work better than other delimiters with Claude — rooted in how Anthropic structured Claude's training data and the model's extensive exposure to XML-structured prompts throughout fine-tuning.

#claude #anthropic #xml

LLM Hacker News Mar 2, 2026 1 min read

MicroGPT Explained Interactively

growingSWE has created an interactive walkthrough of Andrej Karpathy's 200-line pure Python GPT implementation, letting you tokenize names, watch softmax convert scores to probabilities, step through backpropagation, and explore attention heatmaps.

#gpt #transformer #neural-network

LLM Reddit Mar 1, 2026 1 min read

Qwen 3.5-35B-A3B Surpasses GPT-OSS-120B as Daily Driver at 1/3 the Size

The r/LocalLLaMA community is buzzing over Qwen 3.5-35B-A3B, which users report outperforms GPT-OSS-120B while being only one-third the size, making it an excellent local daily driver for development tasks.

#qwen #local-llm #open-source

LLM Reddit Mar 1, 2026 1 min read

DeepSeek V4 Launching Next Week with Image and Video Generation

The Financial Times reports that DeepSeek V4 is set to launch next week, featuring image and video generation capabilities that position it as a direct competitor to multimodal AI models from OpenAI and Google.

#deepseek #llm #image-generation

LLM X/Twitter Mar 1, 2026 1 min read

Karpathy on LLM Memory+Compute: SRAM vs DRAM Trade-offs and the Next Hardware Frontier

Andrej Karpathy highlights the fundamental memory+compute trade-off challenge in LLMs: fast but small on-chip SRAM versus large but slow off-chip DRAM. He calls optimizing this the most intellectually rewarding puzzle in AI infrastructure today, pointing to NVIDIA's $4.6T market cap as proof.

#llm #hardware #inference

LLM Reddit Mar 1, 2026 2 min read

r/MachineLearning: <code>Micro Diffusion</code> shows discrete text diffusion in ~150 lines of Python

A r/MachineLearning project post (score 71, 12 comments) introduced <code>Micro Diffusion</code>, a minimal implementation inspired by <code>Microgpt</code>. The author released three versions (143-line NumPy, 292-line NumPy, 413-line PyTorch) that share the same diffusion loop while swapping denoisers.

#diffusion-models #python #machinelearning