A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.
#local-llm
RSS FeedA well-received PSA on r/LocalLLaMA argues that convenience layers such as Ollama and LM Studio can change model behavior enough to distort evaluation. The more durable lesson from the thread is reproducibility: hold templates, stop tokens, sampling, runtime versions, and quantization constant before judging a model.
A high-scoring r/LocalLLaMA post details a practical move from Ollama/LM Studio-centric flows to llama-swap for multi-model operations. The key value discussed is operational control: backend flexibility, policy filters, and low-friction service management.
A demo running Qwen 3.5 0.8B entirely in the browser using WebGPU and Transformers.js scored 440 on r/LocalLLaMA. No server, no API key, no installation required — just a modern browser with GPU access.
A remarkable 13-month comparison: running frontier-level DeepSeek R1 at ~5 tokens/second cost $6,000 in early 2025. Today, you can run a significantly stronger model at the same speed on a $600 mini PC — and get 17-20 t/s with even more capable models.
Alibaba's Qwen team has released Qwen 3.5 Small, a new small dense model in their flagship open-source series. The announcement topped r/LocalLLaMA with over 1,000 upvotes, reflecting the local AI community's enthusiasm for capable small models.
The r/LocalLLaMA community is buzzing over Qwen 3.5-35B-A3B, which users report outperforms GPT-OSS-120B while being only one-third the size, making it an excellent local daily driver for development tasks.
A high-engagement r/LocalLLaMA thread reports strong early results for Qwen3.5-35B-A3B in local agentic coding workflows. The original poster cites 100+ tokens/sec on a single RTX 3090 setup, while comments show mixed reproducibility and emphasize tooling, quantization, and prompt pipeline differences.
Users on r/LocalLLaMA have spotted Qwen3.5 model names appearing in Alibaba's official Qwen chat interface, signaling an imminent release of the next generation of Alibaba's open-source LLM series.
Qwen3's TTS model encodes voices into 1024-dimensional vectors, enabling gender swapping, pitch adjustment, voice mixing, and semantic voice search through vector math — now available as a standalone lightweight encoder on HuggingFace.
A high-scoring r/LocalLLaMA thread surfaced Qwen3.5-397B-A17B, an open-weight multimodal model card on Hugging Face that lists 397B total parameters with 17B activated and up to about 1M-token extended context.
A high-signal r/LocalLLaMA thread tracked the merge of llama.cpp PR #19375 and highlighted practical throughput gains for Qwen3Next models. Both PR benchmarks and community tests suggest meaningful t/s improvements from graph-level copy reduction.