Unsloth publishes a practical Qwen3.5 fine-tuning guide with concrete VRAM targets

Community context

At crawl time (2026-03-04 12:04:31 UTC), the Hacker News post linking Unsloth’s Qwen3.5 Fine-tuning Guide had 114 points and 34 comments. The engagement is notable because the thread is not about a vague benchmark claim; it points to operational documentation that teams can apply immediately when running local or self-hosted LLM training.

The guide covers the Qwen3.5 lineup (0.8B, 2B, 4B, 9B, 27B, 35B-A3B, 122B-A10B) and includes both text and vision fine-tuning paths. Unsloth claims roughly 1.5x training speed and 50% lower VRAM usage versus FA2-based setups, and provides bf16 LoRA VRAM examples: 3GB (0.8B), 5GB (2B), 10GB (4B), 22GB (9B), and 56GB (27B). It also states that Qwen3.5-35B-A3B bf16 LoRA can run on 74GB VRAM.

Technical points worth tracking

MoE guidance: For MoE variants such as 35B-A3B and 122B-A10B, the guide recommends bf16 LoRA or full fine-tuning, while discouraging 4-bit QLoRA because of quantization limitations.
Dependency requirement: It explicitly calls for transformers v5 for Qwen3.5 support.
Reasoning retention: To preserve reasoning behavior, it recommends mixing in reasoning-style examples at a minimum 75% ratio.
Deployment handoff: It includes paths to export outputs into GGUF, vLLM, Ollama, llama.cpp, and related runtimes.

Why this matters for builders

Practically, this documentation helps teams establish a reproducible baseline quickly: start with bf16 LoRA, validate quality on your own domain data, then decide whether full fine-tuning is worth the extra compute cost. The OOM troubleshooting notes (batch-size and sequence-length reductions, Unsloth gradient checkpointing) are also directly actionable for constrained GPU environments.

The speed and VRAM gains should still be treated as environment-dependent until independently reproduced. But as an engineering playbook, the guide is useful because it narrows ambiguity around initial settings, hardware expectations, and deployment format choices.

Sources: Unsloth Qwen3.5 Fine-tuning Guide, Hacker News discussion.

Unsloth publishes a practical Qwen3.5 fine-tuning guide with concrete VRAM targets

Community context

Technical points worth tracking

Why this matters for builders

Related Articles

r/MachineLearning highlights mlx-tune for Apple Silicon LLM fine-tuning with an Unsloth-style API

Unsloth Studio beta goes after the local model workflow in one interface

Qwen3.7-Max Joins the Frontier: Matches GPT 5.4 on Artificial Analysis Rankings

Related Articles

r/MachineLearning highlights mlx-tune for Apple Silicon LLM fine-tuning with an Unsloth-style API
LLM Reddit Mar 18, 2026 2 min read

Unsloth Studio beta goes after the local model workflow in one interface
LLM Reddit Mar 17, 2026 2 min read

Qwen3.7-Max Joins the Frontier: Matches GPT 5.4 on Artificial Analysis Rankings
LLM Hacker News May 20, 2026 1 min read