Decaying

Unsloth publishes a practical Qwen3.5 fine-tuning guide with concrete VRAM targets

Original: Qwen3.5 Fine-Tuning Guide – Unsloth Documentation View original →

Read in other languages: 한국어日本語
LLM Mar 4, 2026 By Insights AI (HN) 1 min read 40 views Source

Community context

At crawl time (2026-03-04 12:04:31 UTC), the Hacker News post linking Unsloth’s Qwen3.5 Fine-tuning Guide had 114 points and 34 comments. The engagement is notable because the thread is not about a vague benchmark claim; it points to operational documentation that teams can apply immediately when running local or self-hosted LLM training.

The guide covers the Qwen3.5 lineup (0.8B, 2B, 4B, 9B, 27B, 35B-A3B, 122B-A10B) and includes both text and vision fine-tuning paths. Unsloth claims roughly 1.5x training speed and 50% lower VRAM usage versus FA2-based setups, and provides bf16 LoRA VRAM examples: 3GB (0.8B), 5GB (2B), 10GB (4B), 22GB (9B), and 56GB (27B). It also states that Qwen3.5-35B-A3B bf16 LoRA can run on 74GB VRAM.

Technical points worth tracking

  • MoE guidance: For MoE variants such as 35B-A3B and 122B-A10B, the guide recommends bf16 LoRA or full fine-tuning, while discouraging 4-bit QLoRA because of quantization limitations.
  • Dependency requirement: It explicitly calls for transformers v5 for Qwen3.5 support.
  • Reasoning retention: To preserve reasoning behavior, it recommends mixing in reasoning-style examples at a minimum 75% ratio.
  • Deployment handoff: It includes paths to export outputs into GGUF, vLLM, Ollama, llama.cpp, and related runtimes.

Why this matters for builders

Practically, this documentation helps teams establish a reproducible baseline quickly: start with bf16 LoRA, validate quality on your own domain data, then decide whether full fine-tuning is worth the extra compute cost. The OOM troubleshooting notes (batch-size and sequence-length reductions, Unsloth gradient checkpointing) are also directly actionable for constrained GPU environments.

The speed and VRAM gains should still be treated as environment-dependent until independently reproduced. But as an engineering playbook, the guide is useful because it narrows ambiguity around initial settings, hardware expectations, and deployment format choices.

Sources: Unsloth Qwen3.5 Fine-tuning Guide, Hacker News discussion.

Share: Long

Related Articles

LLM Reddit Mar 17, 2026 2 min read

A high-engagement r/LocalLLaMA post highlighted Unsloth Studio, a beta open-source web UI that aims to train, run, and export open models from one local interface. The discussion framed it as a possible LM Studio challenger in the GGUF ecosystem, while top commenters noted that many advanced users still lean on vLLM or direct llama.cpp workflows.

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.