Meta's legal team sent a notice to the Heretic Free Software Project for distributing Llama model derivatives. Heretic responded with sardonic compliance — invoking Galileo — while immediately setting up a Codeberg mirror in Germany and announcing preservation measures.
#llama
RSS FeedThe popular text-generation-webui project, rebranded as TextGen, has relaunched as a no-install native desktop app for Windows, Linux, and macOS. Built on a minimal Electron integration, it positions itself as a fully open-source alternative to LM Studio.
Five major publishers and author Scott Turow filed suit against Meta and Mark Zuckerberg personally, alleging that Zuckerberg directly authorized downloading millions of copyrighted works from piracy sites to train Meta's Llama AI systems.
A fresh r/LocalLLaMA post argues that the main bottleneck in Graph-RAG multi-hop QA is often reasoning rather than retrieval. The linked paper suggests structured prompting and graph-based context compression can let an open Llama 8B model match or beat a plain 70B baseline at a much lower cost.
A new open-source project called ntransformer enables running the 140GB Llama 3.1 70B model on a single consumer RTX 3090 by streaming weights directly from NVMe storage to GPU, completely bypassing CPU RAM.
A high-engagement Hacker News thread spotlights Taalas’ claim that model-specific silicon can cut inference latency and cost, including a hard-wired Llama 3.1 8B deployment reportedly reaching 17K tokens/sec per user.