Qwen3.5-122B-A10B Uncensored (Aggressive) ships in GGUF with new K_P quants

Original: Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants View original →

Read in other languages: 한국어日本語
LLM Mar 23, 2026 By Insights AI (Reddit) 2 min read 1 views Source

A Reddit post in r/LocalLLaMA is drawing attention to a new GGUF release of Qwen3.5-122B-A10B Uncensored (Aggressive). At crawl time, the Reddit thread had 263 points and 94 comments, and it was posted at 2026-03-22T10:42:56.000Z. The discussion points readers to a linked Hugging Face page from HauhauCS, while the original claim set remains in the Reddit thread.

According to the post author, this release is the original Qwen model made uncensored with no personality changes. The same post says the author’s tests recorded 0/465 refusals with zero capability loss. Those numbers are self-reported by the post author and should be read as claims from the thread, not as independent benchmark validation. For local LLM users, that distinction matters because refusal behavior can vary with prompts, sampling settings, and runtime implementations.

The main technical hook is the introduction of new K_P quants. The author says these variants can deliver 1-2 quant levels better quality for ~5-15% larger file size, with Q4_K_P described as closer to Q6_K. If broader testing supports that description, it could make the release interesting for people balancing quality against storage and VRAM constraints. The Reddit post, however, does not include third-party benchmark tables, so the current evidence is limited to the author’s packaging notes and test impressions.

The compatibility notes are practical. The thread says the files are compatible with llama.cpp and LM Studio, while Ollama may need extra work. It also says the release includes mmproj for vision support and uses imatrix-generated quants. There is no BF16 package because BF16 would be ~250GB, according to the post author. That tradeoff helps explain why the release emphasizes compressed GGUF distributions rather than the largest possible precision format.

The model specs listed in the post are unusually detailed for a community release:

  • 122B total
  • ~10B active
  • 256 experts with 8+1 active per token
  • 262K context
  • multimodal text/image/video
  • hybrid attention Gated DeltaNet + softmax (3:1)
  • 48 layers

Taken together, those details position the model as a large Mixture-of-Experts system aimed at serious local deployment rather than lightweight experimentation. The combination of 262K context, multimodal text/image/video support, and the hybrid attention design gives the release a broader profile than a standard text-only checkpoint, even though the post itself is mainly focused on packaging and quantization.

For readers evaluating the story, the most important point is provenance. The release notes and strongest performance claims come from the Reddit thread author, not from an independent lab report or a neutral benchmark run. Anyone interested in the files can review the Reddit thread and the linked Hugging Face page, then compare the release against their own local workloads and evaluation methods.

Share: Long

Related Articles

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

© 2026 Insights. All rights reserved.