Personal hybrid search engine — BM25 + vector (qwen3-vl-embedding) for markdown notes, Claude Code & Codex conversations
npx skills add https://github.com/ethan-huo/seek --skill seekقم بتثبيت هذه المهارة باستخدام واجهة سطر الأوامر (CLI) وابدأ في استخدام سير عمل SKILL.md في مساحة عملك.
Personal hybrid search engine for markdown notes, Claude Code conversations, and Codex conversations. BM25 full-text + vector semantic search with multimodal embedding.
AI agents lose context between sessions. seek indexes everything — your notes, every Claude Code conversation, every Codex session, including screenshots — so agents can recall what you discussed weeks ago.
# requires: go 1.24+, CGO
make build
ln -sf $(pwd)/seek /usr/local/bin/seek
As a skill (Claude Code, Codex, Cursor, etc.):
bunx skills add ethan-huo/seek
# Configure embedding API (DashScope / OpenAI / custom)
seek auth login
# Add your collections
seek add /path/to/notes --name mynotes # markdown
seek add --claude # Claude Code conversations
seek add --codex # Codex conversations
seek add --images /path/to/images -n pics # image files
# Generate embeddings
seek embed
# Hybrid search (BM25 + vector, recommended)
seek search "how to deploy the gateway"
# BM25 keyword search (fast, no API call)
seek search "ECONNREFUSED port 3000" --lex
# Vector semantic search (meaning-based)
seek search "functional programming architecture" --vec
# Incremental sync + embed new content
seek sync && seek embed
Background service — periodic sync + embed via launchd (macOS):
seek service start # every 1 hour (default)
seek service start -i 1800 # every 30 minutes
seek service stop
seek service status
AI tool hooks — auto-sync after every conversation:
seek hooks install # adds Stop hook to Claude Code
seek hooks uninstall
This writes a Stop hook into ~/.claude/settings.json so seek sync runs automatically when Claude finishes a conversation. Combined with the background service (which handles embed), your index stays current without manual intervention.
Indexing — seek sync scans collections incrementally. Markdown files are tracked by content hash. Claude/Codex JSONL files are append-only, tracked by line count. Base64 images in conversations are extracted to ~/.cache/seek/images/.
Embedding — seek embed generates vectors via qwen3-vl-embedding (multimodal). Text and images share the same vector space. Supports DashScope Batch API (50% cheaper) for bulk indexing.
Search — Three modes:
--lex: SQLite FTS5 BM25 ranking--vec: Cosine similarity against stored embeddingsStorage — SQLite database at ~/.cache/seek/index.db. Config at ~/.config/seek/config.yaml.
| Type | Source | What's indexed |
|---|---|---|
markdown |
Any directory | .md files, FTS + chunks + embeddings |
claude |
~/.claude/projects/ |
All Claude Code conversations + screenshots |
codex |
~/.codex/ |
All Codex sessions + screenshots |
images |
Any directory | Image files (png/jpg/webp) with VL embedding |
MIT