#transformers

LLM Hacker News Apr 18, 2026 1 min read

MacMind, HyperCard 안에서 transformer를 손에 잡히게 만들었다

HN이 MacMind를 올린 이유는 transformer를 inspect 가능한 크기로 줄였기 때문이다. Macintosh SE/30의 HyperTalk 안에서 1,216-parameter model이 embeddings, positional encoding, self-attention, backpropagation, gradient descent로 FFT bit-reversal을 배운다.

#transformers #hypercard #retro-computing

LLM sources.twitter Apr 14, 2026 1 min read

CVE-2026-1839, Hugging Face Transformers Trainer의 unsafe checkpoint loading 경로를 지적

Vulmon의 2026년 4월 7일 X post는 Hugging Face Transformers Trainer checkpoint loading 과정의 arbitrary code execution 이슈인 CVE-2026-1839를 짚었다. CVE.org에 따르면 v5.0.0rc3 이전 버전은 PyTorch 2.6 미만 환경에서 조작된 rng_state.pth 파일로 code execution이 가능하며, fix는 weights_only=True를 추가한다.

#huggingface #transformers #security

LLM Hacker News Apr 7, 2026 1 min read

GuppyLM, 언어 모델을 쉽게 풀어낸 8.7M 파라미터 Show HN 프로젝트

Hacker News의 Show HN에서 주목받은 GuppyLM은 60K 합성 대화 데이터와 단순한 transformer 구조로 LLM 학습 전 과정을 드러낸다. Colab과 브라우저에서 바로 실행할 수 있는 교육용 초소형 모델이라는 점이 핵심이다.

#llm #education #pytorch

LLM Reddit Apr 3, 2026 1 min read

Reddit가 주목한 Stanford의 공개 CS25 Transformers 강의, Spring 2026 시작

Stanford의 공개 CS25 강의는 Zoom, recordings, Discord를 통해 campus 밖까지 확장된 Transformer 연구 학습 채널로 다시 작동하고 있다.

#transformers #stanford #education

LLM Hacker News Apr 2, 2026 1 min read

Hacker News가 다시 짚은 long-context LLM의 KV cache 비용

Hacker News는 KV cache를 추상적 architecture 용어가 아니라 GPU memory 비용 문제로 설명한 Future Shock 글을 다시 끌어올렸다. 이 설명은 GPT-2에서 Llama 3, DeepSeek V3, Gemma 3, Mamba 계열까지 memory 설계가 어떻게 달라졌는지 한 흐름으로 보여 준다.

#kv-cache #inference #transformers

LLM Reddit Apr 1, 2026 1 min read

RBF-Attention으로 Transformer를 다시 짜 본 실험, r/MachineLearning 토론 정리

r/MachineLearning의 한 실험 글이 dot-product attention을 Euclidean distance 기반 RBF attention으로 바꾸며 생기는 구현 문제와 작은 성능 신호를 정리해 화제가 됐다.

#transformers #attention #rbf

LLM Reddit Mar 27, 2026 1 min read

LocalLLaMA가 본 RYS II, Qwen3.5 27B relayering과 universal language 가설

David Noel Ng의 후속 글은 layer duplication을 감으로가 아니라 search problem으로 다루고, multilingual hidden-state 비교로 middle layers의 shared reasoning space 가능성을 제시했다.

#qwen #transformers #relayering

LLM Hacker News Mar 21, 2026 1 min read

Hacker News, Transformer depth 개선을 노린 Moonshot AI의 Attention Residuals 주목

2026년 3월 20일 Hacker News에서 Attention Residuals가 논의되며, 고정 residual addition 대신 learned depth-wise attention을 쓰는 접근과 낮은 overhead의 의미가 부각됐다.

#llm #transformers #research

AI Reddit Mar 20, 2026 1 min read

r/MachineLearning, Clip to Grok 실험 주목... 단순한 weight norm clipping으로 grokking 지연 단축 주장

2026년 3월 17일 r/MachineLearning에 올라온 Clip to Grok 글은 크롤링 시점 기준 56점과 20개 댓글을 기록했다. 작성자들은 optimizer step마다 decoder weight row를 L2 clipping하는 방식으로 modular arithmetic benchmark에서 18배에서 66배 빠른 generalization을 얻었다고 주장한다.

#grokking #optimization #transformers

LLM Reddit Mar 18, 2026 2 min read

r/LocalLLaMA가 짚은 transformer “danger zone”, layer duplication이 통하는 구간과 망가지는 구간

한 r/LocalLLaMA 실험 글은 model depth의 약 50~56% 부근에서 layer를 복제하면 성능이 무너지거나 output이 깨진다고 주장한다. Dense, hybrid, MoE, transplant 사례를 함께 비교했다는 점에서 단순 anecdote보다 한 단계 나아간다.

#transformers #model-surgery #localllama

LLM Hacker News Mar 16, 2026 1 min read

Hacker News가 주목한 최신 LLM architecture 시각 레퍼런스

Sebastian Raschka의 LLM Architecture Gallery는 최근 open model 계열을 한 페이지의 비교 가능한 다이어그램으로 묶어 dense, MoE, hybrid design 차이를 빠르게 파악하게 해 준다고 HN에서 호응을 얻었다.

#llm-architectures #transformers #moe

LLM Hacker News Mar 13, 2026 2 min read

Hacker News, transformer 내부에서 program execution을 수행한다는 Percepta 주장에 주목

Percepta는 2026년 3월 11일 공개한 글에서 transformer 내부에 computer를 만들고, arbitrary C program을 수백만 step 실행하며, 2D attention head로 inference를 지수적으로 가속할 수 있다고 주장했다. HN 이용자들은 흥미로운 연구 방향으로 봤지만, 더 명확한 설명과 benchmark, 실제 확장성에 대한 근거를 요구했다.

#transformers #inference #llm-research