Show HN: Timber Compiles Classical ML Models into Tiny C Binaries for Microsecond Inference
Original: Show HN: Timber – Ollama for classical ML models, 336x faster than Python View original →
What Timber is proposing
A March 2026 Show HN post introduced Timber, an open-source compiler focused on classical machine learning models rather than LLM inference. The project claims it can take trained models from XGBoost, LightGBM, scikit-learn, CatBoost, and ONNX tree operators, then emit a self-contained C99 inference artifact with no runtime dependency on Python. The default serving path includes an Ollama-compatible HTTP API so teams can expose models via familiar endpoints without building a custom serving layer.
The README positions Timber for low-latency, deterministic environments such as fraud detection, risk scoring, and edge deployments. The maintainers emphasize portability and small artifacts, including an example compiled binary around 48 KB for a sample model.
Compiler pipeline and API surface
Timber’s documented pipeline is parse → IR construction → optimization → C99 emission → native compilation. Listed optimization passes include dead-leaf elimination, threshold quantization, constant-feature folding, and branch sorting. The serving interface then wraps the compiled model behind endpoints such as /api/predict, /api/models, and /api/health.
That architecture is interesting for teams that already maintain tree-based models and want to remove Python from hot inference paths, especially where cold-start behavior and deployment size matter.
Performance claims and interpretation
The project reports around 2 microseconds single-sample latency and roughly 336x speedup versus a Python XGBoost baseline in its benchmark setup (Apple M2 Pro, 50-tree classifier scenario). It also includes comparison numbers against ONNX Runtime and Treelite. These are project-authored benchmarks, so production teams should treat them as directional and reproduce results under their own feature engineering and transport stack.
HN discussion themes
The Hacker News thread reached 199 points with 33 comments at crawl time. Discussion centered on practical fit: some readers welcomed renewed focus on “classical ML” infrastructure, while others questioned whether inference speedups remain the dominant bottleneck when data transformation pipelines are still Python-heavy. That debate is the core adoption question for Timber: if your bottleneck is feature prep, compiler-level inference gains may be secondary; if your bottleneck is repeated scoring in latency-critical paths, the approach can be compelling.
Related Articles
HN found this interesting because it tests a real boundary: whether Apple Silicon unified memory can make a Wasm sandbox and a GPU buffer operate on the same bytes.
Why it matters: model launches live or die on serving and training support, not just weights. LMSYS says its Day-0 stack reached 199 tok/s on B200 and 266 tok/s on H200, while staying strong out to 900K context.
HN treated TPU 8t and 8i as more than giant datacenter numbers. The thread focused on the bigger shift: agent-era infrastructure is splitting training and inference into separate hardware bets.
Comments (0)
No comments yet. Be the first to comment!