LLM 2h ago 1 min read
QVAC SDK 0.12.0 adds TurboQuant as an opt-in KV-cache compression feature for local LLMs. The company says it can cut runtime context memory by up to 5x and put 262K-token 4B-model contexts within reach of 8GB consumer GPUs.