Show HN: Timber Compiles Classical ML Models into Tiny C Binaries for Microsecond Inference
Original: Show HN: Timber – Ollama for classical ML models, 336x faster than Python View original →
What Timber is proposing
A March 2026 Show HN post introduced Timber, an open-source compiler focused on classical machine learning models rather than LLM inference. The project claims it can take trained models from XGBoost, LightGBM, scikit-learn, CatBoost, and ONNX tree operators, then emit a self-contained C99 inference artifact with no runtime dependency on Python. The default serving path includes an Ollama-compatible HTTP API so teams can expose models via familiar endpoints without building a custom serving layer.
The README positions Timber for low-latency, deterministic environments such as fraud detection, risk scoring, and edge deployments. The maintainers emphasize portability and small artifacts, including an example compiled binary around 48 KB for a sample model.
Compiler pipeline and API surface
Timber’s documented pipeline is parse → IR construction → optimization → C99 emission → native compilation. Listed optimization passes include dead-leaf elimination, threshold quantization, constant-feature folding, and branch sorting. The serving interface then wraps the compiled model behind endpoints such as /api/predict, /api/models, and /api/health.
That architecture is interesting for teams that already maintain tree-based models and want to remove Python from hot inference paths, especially where cold-start behavior and deployment size matter.
Performance claims and interpretation
The project reports around 2 microseconds single-sample latency and roughly 336x speedup versus a Python XGBoost baseline in its benchmark setup (Apple M2 Pro, 50-tree classifier scenario). It also includes comparison numbers against ONNX Runtime and Treelite. These are project-authored benchmarks, so production teams should treat them as directional and reproduce results under their own feature engineering and transport stack.
HN discussion themes
The Hacker News thread reached 199 points with 33 comments at crawl time. Discussion centered on practical fit: some readers welcomed renewed focus on “classical ML” infrastructure, while others questioned whether inference speedups remain the dominant bottleneck when data transformation pipelines are still Python-heavy. That debate is the core adoption question for Timber: if your bottleneck is feature prep, compiler-level inference gains may be secondary; if your bottleneck is repeated scoring in latency-critical paths, the approach can be compelling.
Related Articles
Meta says custom silicon is critical to scaling next-generation AI and has published a roadmap update for its MTIA family. The company says it accelerated development enough to release four generations in two years as model architectures keep changing faster than traditional chip cycles.
NVIDIA unveiled its next-gen AI platform Rubin, delivering 10x reduction in inference token cost and 4x fewer GPUs for MoE model training vs. Blackwell. Launch planned for H2 2026.
Microsoft announced Maia 200 (codenamed Braga) on 2026-01-26 as its second-generation in-house AI accelerator. The company says selected Copilot and Azure AI workloads show up to 1.7x performance versus Maia 100.
Comments (0)
No comments yet. Be the first to comment!