Semble: Open-Source Code Search for AI Agents That Uses 98% Fewer Tokens
Original: Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep View original →
The Problem
When AI agents like Claude Code navigate large codebases, they typically rely on grep and broad file reads — consuming enormous amounts of context tokens. grep+read needs a full 100k context window just to hit 85% recall. Semble, released by MinishLab, tackles this directly.
How It Works
Semble uses a two-stage retrieval pipeline. The first stage applies tree-sitter for code-aware chunking, then scores candidates using both Model2Vec semantic embeddings and BM25 lexical matching. The second stage reranks results with code-specific signals: definition boosts, identifier stem matching, file coherence, and noise penalties for test and legacy code.
Everything runs on CPU. No external APIs, no GPU, no authentication required. A typical repository indexes in ~200ms; queries return in ~1.5ms.
Benchmarks
- Token efficiency: 94% recall at just 2k tokens — vs. 100k context for 85% recall with grep+read
- NDCG@10: 0.854 — 99% of the 137M-parameter CodeRankEmbed transformer model
- Indexing speed: ~200x faster than code-specialized transformers (~200ms)
- Query speed: ~10x faster (~1.5ms per query)
Integration
Add Semble to Claude Code as an MCP server with a single command:
claude mcp add semble -s user -- uvx --from "semble[mcp]" sembleCursor, Codex, and OpenCode support the same uvx command structure. For shell-based workflows, document semble search and semble find-related in AGENTS.md.
Why It Matters
Token efficiency directly impacts cost, speed, and context window limits for AI agent workflows. Semble hits a practical sweet spot: near-transformer search quality with zero external dependencies, running fully offline. As AI coding agents become standard development tools, efficient codebase navigation becomes a first-class engineering concern.
Related Articles
Alibaba's Qwen team has released Qwen3.7-Max, an agent-focused frontier LLM. It ranks 5th on Artificial Analysis's Intelligence Index, nearly matching GPT 5.4, and is available as both an API and open weights.
Forge is a new open-source Python framework that applies structured guardrails to self-hosted LLMs. The best config — Ministral-3 8B Q8 — jumps from a 53% baseline to 86.5% on the 26-scenario eval suite, with 99% achievable on agentic tasks.
HN did not push Browser Harness because it was another browser wrapper. It took off because the repo lets an LLM patch its own browser helpers in the middle of a task, trading safety rails for raw flexibility.
Comments (0)
No comments yet. Be the first to comment!