A Hacker News post surfaced Unsloth's Qwen3.5 local guide, which lays out memory targets, reasoning-mode controls, and llama.cpp commands for running 27B and 35B-A3B models on local hardware.
#gguf
LLM Hacker News 4d ago 2 min read
LLM Reddit 6d ago 2 min read
A LocalLLaMA thread highlighted ongoing work to add NVFP4 quantization support to llama.cpp GGUF, pointing to potential memory savings and higher throughput for compatible GPU setups.
LLM Reddit Mar 4, 2026 1 min read
A high-scoring LocalLLaMA post benchmarked Qwen3.5-27B Q4 GGUF variants against BF16, separating “closest-to-baseline” choices from “best efficiency” picks for constrained VRAM setups.
LLM Reddit Feb 28, 2026 2 min read
A high-engagement r/LocalLLaMA thread reviewed Unsloth’s updated Qwen3.5-35B-A3B dynamic quantization release, including KLD/PPL data, tensor-level tradeoffs, and reproducibility artifacts.
LLM Reddit Feb 18, 2026 2 min read
A high-engagement LocalLLaMA post highlighted local deployment paths for MiniMax-M2.5, pointing to Unsloth GGUF packaging and renewed discussion on memory, cost, and agentic workloads.