Architecture¶

Distillery is built as a 4-layer system where skills (SKILL.md files) drive all user interaction, the MCP server mediates all storage access, and backends are swappable through typed Protocol interfaces.

Layers¶

Layer	What it does	Key files
Skills	15 SKILL.md files — portable, version-controlled slash commands. Not Python code.	`skills/*/SKILL.md`
MCP Server	Tools exposed over stdio (local) or streamable-HTTP (team). Built on FastMCP 2.x/3.x with `@server.tool` decorators.	`src/distillery/mcp/server.py`
Webhook API	REST endpoints mounted under `/api/*` for orchestrated operations. The active endpoint is `POST /api/maintenance` (full poll → rescore → classify-batch pipeline). Individual scheduling endpoints — `POST /api/poll`, `POST /api/rescore`, `POST /api/classify-batch` (also reachable at `POST /api/hooks/poll`, `/api/hooks/rescore`, `/api/hooks/classify-batch` during the deprecation window) — are deprecated in favour of Claude Code routines and the orchestrated maintenance endpoint. Bearer token auth, per-endpoint cooldowns persisted to DuckDB. Mounted alongside MCP in HTTP mode.	`src/distillery/mcp/webhooks.py`
Auth	MCP: GitHub OAuth with org-restricted access. Webhooks: bearer token via `DISTILLERY_WEBHOOK_SECRET`. Middleware handles logging, rate limiting, security headers, budget tracking.	`src/distillery/mcp/auth.py`, `middleware.py`, `budget.py`
Core Protocols	Typed `Protocol` interfaces (structural subtyping, not ABCs). All storage operations are async. Includes `query_audit_log` for audit data access.	`src/distillery/store/protocol.py`, `embedding/protocol.py`
Feeds	GitHub events and RSS/Atom polling. Authenticated via `GITHUB_TOKEN` for private repos. Auto-tagging (source + topic tags from KB vocabulary). Relevance scoring via embeddings. Interest extraction for source suggestions.	`src/distillery/feeds/`
Backends	DuckDB + VSS (HNSW) + FTS (BM25). Hybrid search with RRF fusion and recency decay. fastembed (default, on-device) / Jina v3 / OpenAI embeddings. LLM classification with dedup + conflict detection.	`src/distillery/store/duckdb.py`, `embedding/`, `classification/`

Key Design Decisions¶

Skills are SKILL.md files, not Python code. They are portable, version-controlled, and team-shareable. Claude Code loads the markdown and follows the instructions — no compilation or import required.

MCP server is the primary runtime interface. All storage access goes through the protocol, over stdio (local) or HTTP (team). Skills never access the database directly. REST webhook endpoints provide a secondary interface for automated scheduling — they share the same DuckDB store and run in the same uvicorn process.

Storage abstraction via DistilleryStore protocol. Enables future migration to Elasticsearch without rewriting skills or the MCP server.

Configurable embedding providers. Swap between on-device fastembed (the plugin's install-time default), Jina v3, OpenAI, or a zero-vector stub for testing via distillery.yaml.

Semantic deduplication. Prevents knowledge base pollution with configurable thresholds:

Threshold	Default	Action
Skip	0.95	Near-duplicate — don't store
Merge	0.80	Similar enough to combine
Link	0.60	Related — store with cross-reference
Below 0.60	—	Unique — store normally

Classification with confidence scoring. LLM-based type assignment with a team review queue for low-confidence results (below the configurable confidence_threshold, default: 60%).

Core Data Model¶

The Entry dataclass (src/distillery/models.py) is the fundamental unit of knowledge:

Field	Type	Description
`id`	str (UUID4)	Unique identifier
`content`	str	The knowledge content
`entry_type`	EntryType	session, bookmark, minutes, meeting, reference, idea, inbox, person, project, digest, github, feed
`source`	EntrySource	claude_code, manual, import, inference, documentation, external
`status`	EntryStatus	active, pending_review, archived
`tags`	list[str]	Hierarchical tags (`project/distillery/decisions`)
`metadata`	dict	Type-specific fields (validated per entry type)
`version`	int	Incremented on updates
`author`	str	Who created the entry
`project`	str \| None	Which project context
`created_at`	datetime	Creation timestamp
`session_id`	str \| None	Session grouping identifier
`verification`	VerificationStatus	Unverified, Testing, or Verified
`expires_at`	datetime \| None	Optional expiration timestamp
`updated_at`	datetime	Last modification

Project Structure¶

distillery/
├── skills/                  # Claude Code skill definitions (loaded via plugin)
│   ├── distill/SKILL.md
│   ├── recall/SKILL.md
│   ├── pour/SKILL.md
│   ├── bookmark/SKILL.md
│   ├── minutes/SKILL.md
│   ├── classify/SKILL.md
│   ├── watch/SKILL.md
│   ├── radar/SKILL.md
│   ├── tune/SKILL.md
│   ├── setup/SKILL.md
│   ├── digest/SKILL.md
│   ├── gh-sync/SKILL.md
│   ├── investigate/SKILL.md
│   ├── briefing/SKILL.md
│   ├── compass/SKILL.md
│   └── CONVENTIONS.md
├── src/distillery/
│   ├── models.py            # Entry, SearchResult, enums
│   ├── config.py            # YAML config loading
│   ├── security.py          # Input sanitization and content validation
│   ├── store/
│   │   ├── protocol.py      # DistilleryStore protocol
│   │   └── duckdb.py        # DuckDB + VSS backend
│   ├── embedding/
│   │   ├── protocol.py      # EmbeddingProvider protocol
│   │   ├── fastembed.py     # fastembed (ONNX, on-device) adapter
│   │   ├── jina.py          # Jina v3 adapter
│   │   └── openai.py        # OpenAI adapter
│   ├── classification/
│   │   ├── models.py        # ClassificationResult, DeduplicationResult
│   │   ├── engine.py        # ClassificationEngine
│   │   └── dedup.py         # DeduplicationChecker
│   ├── mcp/
│   │   ├── server.py        # MCP server (FastMCP 2.x/3.x)
│   │   ├── webhooks.py      # REST webhook endpoints (/api/poll, /api/rescore, /api/maintenance)
│   │   ├── auth.py          # GitHub OAuth via FastMCP GitHubProvider
│   │   ├── middleware.py     # Request logging, rate limiting, security headers
│   │   ├── budget.py        # Embedding API budget tracking
│   │   └── __main__.py      # CLI: --transport stdio|http (composes MCP + webhooks)
│   └── feeds/
│       ├── github.py        # GitHub event adapter
│       ├── rss.py           # RSS/Atom feed adapter
│       ├── scorer.py        # Embedding-based relevance scorer
│       ├── poller.py        # Background feed poller
│       └── interests.py     # Interest extractor for source suggestions
├── tests/                   # 1600+ tests (unit + integration)
├── deploy/
│   ├── fly/                 # Fly.io deployment (persistent DuckDB)
│   └── prefect/             # Prefect Horizon deployment (MotherDuck)
└── docs/                    # This documentation site

Feed Architecture¶

The ambient intelligence system monitors external sources and scores relevance:

Source registry — managed via /watch, stored in DuckDB
Feed adapters — GitHub REST API events, RSS/Atom feeds
Auto-tagging — source tags (source/github/owner/repo, source/reddit/sub, source/domain) and topic tags matched from KB vocabulary via keyword map. Tags applied at ingestion; backfill via distillery retag CLI.
Relevance scoring — embedding-based cosine similarity against user interest profile, with interest boost (up to +0.15) and per-source trust weighting
Interest extraction — mines existing entries for tags, domains, repos, expertise
Digest generation — /radar uses interest-driven semantic search to surface the most relevant feed entries (falls back to newest-first listing when interests are unavailable)
Automated scheduling — Claude Code routines handle hourly feed polling, daily stale checks, and weekly maintenance. The /api/maintenance webhook endpoint remains available for orchestrated operations. Individual /hooks/* scheduling endpoints are deprecated (see #272).

Search Architecture¶

distillery_search uses hybrid ranking by default, combining two retrieval signals:

Vector search — HNSW cosine similarity via DuckDB VSS extension
Keyword search — BM25 via DuckDB FTS extension (migration 7)
Fusion — Reciprocal Rank Fusion (RRF) with configurable k (default 60)
Recency decay — linear weight from 1.0 (today) to recency_min_weight (default 0.5) over recency_window_days (default 90)
Graceful degradation — falls back to vector-only if FTS extension unavailable

distillery_find_similar uses pure cosine similarity (no hybrid) — dedup thresholds depend on calibrated absolute scores.

Scheduling¶

Distillery uses Claude Code routines for all scheduled tasks:

Routine	Frequency	Purpose
`distillery-feed-poll`	Hourly	Poll all feed sources
`distillery-stale-check`	Daily	Find entries needing refresh or archival
`distillery-weekly-maintenance`	Weekly	Stats, stale entries, feed activity, digest

Routines run automatically in the background when Claude Code is active. Configure them via /setup.

Webhook Endpoints (Partially Deprecated)¶

REST endpoints mounted at /api/* alongside the MCP server in HTTP mode. Enabled when both DISTILLERY_WEBHOOK_SECRET is set and the runtime flag config.server.webhooks.enabled is true.

Endpoint	Operation	Cooldown	Status
`POST /api/poll`	Poll all feed sources (alias: `POST /api/hooks/poll`)	5 min	Deprecated — use `/api/maintenance` or routines
`POST /api/rescore`	Re-score feed entries (alias: `POST /api/hooks/rescore`)	1 hour	Deprecated — use `/api/maintenance` or routines
`POST /api/classify-batch`	Batch classification (alias: `POST /api/hooks/classify-batch`)	5 min	Deprecated — use `/api/maintenance` or routines
`POST /api/maintenance`	Orchestrated maintenance (poll + rescore + classify + retention)	6 hours	Active

Deprecation Notice

The individual /api/poll, /api/rescore, and /api/classify-batch endpoints (and their /api/hooks/* aliases) are deprecated. Scheduling should use Claude Code routines or the orchestrated /api/maintenance endpoint instead. See issue #272 for migration details.

Auth: Authorization: Bearer <DISTILLERY_WEBHOOK_SECRET> with hmac.compare_digest.

Hardening: Per-endpoint asyncio.Lock serializes cooldown checks. BodySizeLimitMiddleware + RateLimitMiddleware (10 req/min, 100 req/hour). Cooldown timestamps persisted to DuckDB via get_metadata/set_metadata.

Audit: Each invocation stores a webhook_audit:{endpoint} metadata record with timestamp, status, and response data. All tool invocations and login events are recorded in the audit_log table, queryable via distillery_list(output_mode="audit") or store.query_audit_log().