A trending r/LocalLLaMA thread highlighted the DualPath paper on KV-Cache bottlenecks in disaggregated inference systems. The arXiv abstract reports up to 1.87x offline throughput and 1.96x average online throughput gains while meeting SLO.
#kv-cache
LLM Reddit Feb 26, 2026 1 min read
LLM Hacker News Feb 21, 2026 2 min read
A high-score Hacker News discussion surfaced Together AI's CDLM post, which claims up to 14.5x latency improvements for diffusion language models by combining trajectory-consistent step reduction with exact block-wise KV caching.
LLM Reddit Feb 14, 2026 1 min read
A February 13, 2026 post in r/LocalLLaMA highlighted NVIDIA Dynamic Memory Sparsification (DMS), claiming up to 8x KV cache memory savings without accuracy loss. Community discussion centered on inference cost, throughput, and what needs verification from primary technical sources.