High-performance AI coding agent CLI written in Rust with zero unsafe code
npx skills add https://github.com/dicklesworthstone/pi_agent_rust --skill interactive-shellقم بتثبيت هذه المهارة باستخدام واجهة سطر الأوامر (CLI) وابدأ في استخدام سير عمل SKILL.md في مساحة عملك.
pi_agent_rust - High-performance AI coding agent CLI written in Rust
Why Should You Care? • TL;DR • Methodology • Quick Start • Features • Installation • Commands • Configuration
# Install latest release
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | bash
You want an AI coding assistant in your terminal, but existing tools are:
pi_agent_rust is a from-scratch Rust port of Pi Agent by Mario Zechner (made with his blessing!). Single binary, instant startup, stable streaming, and 8 built-in tools.
Rather than a direct line-by-line translation, this port builds on two purpose-built Rust libraries:
# Start a session
pi "Help me refactor this function to use async/await"
# Continue a previous session
pi --continue
# Single-shot mode (no session)
pi -p "What does this error mean?" < error.log
If you already use Pi Agent, especially through OpenClaw, this project keeps the core workflow while upgrading the engine under the hood:
Security is a first-class design goal here, not a bolt-on:
tool/exec/http/session/ui/events)exec enforcement: capability gate first, then command mediation that blocks critical shell classes by default (for example recursive delete, disk/device writes, reverse shell) and can tighten to block high-tier classes in strict/safe policypending -> acknowledged -> trusted -> killed) with kill-switch audit logs and explicit operator provenanceasupersync for more predictable cancellation/lifecycle behaviorThese are the realistic secure-path numbers that matter most (large-session, end-to-end behavior):
| Scenario | Rust total | Legacy Node total | Legacy Bun total | Rust advantage |
|---|---|---|---|---|
| Realistic 1M session | 250.29 ms | 1,238.67 ms | 700.52 ms | 4.95x faster than Node, 2.80x faster than Bun |
| Realistic 5M session | 1,382.12 ms | 5,974.67 ms | 2,959.42 ms | 4.32x faster than Node, 2.14x faster than Bun |
| Scenario | Rust RSS | Legacy Node RSS | Legacy Bun RSS | Rust memory advantage |
|---|---|---|---|---|
| Realistic 1M session | 67,572 KB | 820,380 KB | 875,092 KB | 12.14x lower than Node, 12.95x lower than Bun |
| Realistic 5M session | 268,844 KB | 2,173,096 KB | 3,057,908 KB | 8.08x lower than Node, 11.37x lower than Bun |
Resume/open responsiveness is also much better at scale:
| Scenario | Rust open | Legacy Node open | Legacy Bun open | Rust advantage |
|---|---|---|---|---|
| 1M session resume | 17.59 ms | 119.76 ms | 50.83 ms | 6.81x faster than Node, 2.89x faster than Bun |
| 5M session resume | 58.68 ms | 396.41 ms | 155.63 ms | 6.76x faster than Node, 2.65x faster than Bun |
Extension runtime guarantees are also concrete:
| Extension assurance signal | Why you should care |
|---|---|
Two-stage exec guard (exec capability policy + command-level mediation + DCG/heredoc AST signals) |
Dangerous shell intent is caught before spawn, including destructive payloads hidden in multiline wrappers |
Trust lifecycle + kill switch (pending/acknowledged/trusted/killed) |
You can quarantine an extension instantly, log who pulled the switch and why, and require explicit re-acknowledgement before restoring access |
Hostcall lane kill-switch controls (forced_compat_global_kill_switch, forced_compat_extension_kill_switch) |
Fast-path regressions can be contained immediately by forcing compatibility-lane execution without disabling the extension system |
| Deterministic hostcall reactor mesh (shard affinity, bounded SPSC lanes, backpressure telemetry, optional NUMA slab tracking) | Runtime behavior stays predictable under contention; queue pressure and routing decisions are observable instead of opaque |
| Startup prewarm + warm isolate reuse for JS runtimes | Runtime creation overlaps startup and warm reuse keeps repeated extension runs low-latency without a Node/Bun process model |
Tamper-evident runtime risk ledger (verify / replay / calibrate) |
Security decisions are hash-linked and can be replayed or threshold-tuned from real runtime traces |
Bottom line: for real Pi/OpenClaw usage, the Rust version is faster, far more memory-efficient, and materially stronger on extension runtime safety under real workload pressure.
Data source: BENCHMARK_COMPARISON_BETWEEN_RUST_VERSION_AND_ORIGINAL__GPT.md (latest secure-path + full orchestrator checkpoints, 2026-02-19).
In this README, we means the project owner and collaborating coding agents.
The speed gains come from runtime design, not one trick.
| Technique | What we do | Runtime effect |
|---|---|---|
| Cold-start minimization | Single static binary, no Node/Bun runtime bootstrap, no JIT warmup, startup prewarm for extension runtime paths | Faster time-to-first-interaction |
| Less copying on hot paths | Arc/Cow message flow, zero-copy hostcall/tool payload handling, reduced clone-heavy provider/session paths |
Lower CPU and allocation pressure |
| Deterministic dispatch core | Typed hostcall opcodes, fast-lane/compat-lane routing, bounded shard queues with reactor-mesh telemetry | Better tail latency under concurrent extension load |
| Efficient long-session storage | SQLite session index + v2 sidecar (segmented log + offset index) with O(index+tail) reopen path | Fast resume on large histories |
| Streaming parser tuned for real networks | SSE parser tracks scanned bytes, handles UTF-8 tails, normalizes chunk boundaries, interns event-type strings | Lower streaming overhead and fewer parser stalls |
| Safe fast-path controls | Shadow dual execution sampling, automatic backoff on divergence/overhead, compatibility-lane kill switches for containment | Keeps optimizations fast without silent behavior drift |
| CI-level performance governance | Scenario matrices, strict artifact contracts, fail-closed perf gates | Regressions are caught before release |
If you want the full implementation inventory, see Performance Engineering.
The benchmarks cited above are intentionally designed to be realistic, reproducible, and hard to game.
What we measured:
100k up to 5M token-class session states.--help, --version) separately from long-session workflows.How we kept comparisons fair:
pi_agent_rust vs legacy coding-agent)How we kept claims honest:
If you want full details, see:
BENCHMARK_COMPARISON_BETWEEN_RUST_VERSION_AND_ORIGINAL__GPT.md (methodology + results + caveats + raw artifact paths)| Feature | Pi (Rust) | Typical TS/Python CLI |
|---|---|---|
| Startup | <100ms | 500ms-2s |
| Binary size | <22MB (CI-gated budget) | 100MB+ (with runtime) |
| Memory (idle) | <50MB | 200MB+ |
| Streaming | Native SSE parser | Library-dependent |
| Tool execution | Process tree management | Basic subprocess |
| Sessions | JSONL with branching | Varies |
| Unsafe code | Forbidden | N/A |
# 1) Start an interactive session
pi
# 2) Ask a codebase question
pi "Summarize the architecture in src/"
# 3) Attach a file inline
pi @src/main.rs "Explain startup flow"
# 4) Run single-shot mode for scripting
pi -p "List likely regression risks for this diff"
# 5) Continue your last project session
pi --continue
# 6) Inspect available models/providers
pi --list-models
pi --list-providers
asupersync is a structured concurrency async runtime designed for applications that need predictable resource cleanup. Key features used by pi_agent_rust:
Cx): Async functions receive an explicit context that controls what they can do (HTTP, filesystem, time). This makes testing deterministic.pi_agent_rust runs on asupersync end-to-end today (runtime + HTTP/TLS + cancellation). Provider streaming uses a minimal HTTP client (src/http/client.rs) feeding a custom SSE parser (src/sse.rs).
rich_rust is a Rust port of Will McGugan's Rich Python library. It provides:
[bold red]error[/] renders as bold red textThe terminal UI uses rich_rust for all output formatting, providing the same visual quality as Rich-based Python tools.
# Install latest release binary
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | bash
If you already have the original TypeScript pi installed, the installer asks
whether to make Rust Pi canonical as pi and automatically create legacy-pi
for the old command.
export ANTHROPIC_API_KEY="sk-ant-..."
# Interactive mode
pi
# With an initial message
pi "Explain this codebase structure"
# Read files as context
pi @src/main.rs "What does this do?"
Real-time token streaming with extended thinking support:
pi "Write a quicksort implementation"
Watch the response appear token-by-token, with thinking blocks shown inline.
| Tool | Description | Example |
|---|---|---|
read |
Read file contents, supports images | Read src/main.rs |
write |
Create or overwrite files | Write a new config file |
edit |
Surgical string replacement | Fix the typo on line 42 |
bash |
Execute shell commands with timeout | Run the test suite |
grep |
Search file contents with context | Find all TODO comments |
find |
Discover files by pattern | Find all *.rs files |
ls |
List directory contents | What's in src/? |
All tools include:
Sessions persist as JSONL files with full conversation history:
# Continue most recent session
pi --continue
# Open specific session
pi --session ~/.pi/agent/sessions/--home-user-project--/2024-01-15T10-30-00.jsonl
# Ephemeral (no persistence)
pi --no-session
Sessions support:
Enable deep reasoning for complex problems:
pi --thinking high "Design a distributed rate limiter"
Thinking levels: off, minimal, low, medium, high, xhigh
SKILL.md under ~/.pi/agent/skills/ or .pi/skills/ and invoke with /skill:name.~/.pi/agent/prompts/ or .pi/prompts/; invoke via /<template> [args].pi install npm:@org/pi-packages (skills, prompts, themes, extensions).Pi provides context-aware autocomplete in the interactive editor:
@ file references: Type @ followed by a path fragment to attach file contents. The completion engine indexes project files (respecting .gitignore) via the ignore crate's WalkBuilder, capping at 5,000 entries./ slash commands: Built-in commands (/help, /model, /tree, /clear, /compact, /exit) and user-defined prompt templates and skills all appear as completions.Pi runs in three modes, each suited to different workflows:
| Mode | Invocation | Use Case |
|---|---|---|
| Interactive | pi (default) |
Full TUI with streaming, tools, session branching, autocomplete |
pi -p "..." |
Single response to stdout, no TUI, scriptable | |
| RPC | pi --mode rpc |
Headless JSON protocol over stdin/stdout for IDE integrations |
Interactive mode provides the full experience: a multi-line text editor with history, scrollable conversation viewport, model selector (Ctrl+L), scoped model cycling (Ctrl+P/Ctrl+Shift+P), session branch navigator (/tree), and real-time token/cost tracking.
Print mode sends one message, streams the response to stdout, and exits. Useful for shell scripts and one-off queries.
RPC mode exposes a line-delimited JSON protocol for programmatic control. Clients send commands (prompt, steer, follow-up, abort, get-state, compact) and receive streaming events. This is how IDE extensions and custom frontends integrate with Pi. See RPC Protocol for the wire format.
Pi supports two extension runtime families with capability-gated host connectors:
JS/TS entrypoints run without Node or Bun in an embedded QuickJS runtime.
*.native.json descriptors run in the native-rust descriptor runtime.
Extension entrypoints are auto-detected:
.js/.ts/.mjs/.cjs/.tsx/.mts/.cts run directly in embedded QuickJS (no descriptor conversion).*.native.json loads the native-rust descriptor runtime.Sub-100ms cold load (P95), sub-1ms warm load (P99)
Node API shims for fs, path, os, crypto, child_process, url, and more
Capability-based security: extensions call explicit connectors (tool/exec/http/session/ui) with audit logging
Command-level exec mediation: dangerous shell signatures are classified and blocked before spawn, with redacted denial alerts and mediation ledger entries
Trust-state lifecycle and kill-switch controls with audited state transitions (pending/acknowledged/trusted/killed)
Hostcall reactor mesh with deterministic shard routing, bounded queue backpressure, and optional NUMA-aware telemetry
Runtime prewarm path with warm isolate reuse so extension startup cost is mostly paid before the first prompt
/model (or Ctrl+L) opens a selector focused on models that are ready to run with current credentials.Ctrl+P and Ctrl+Shift+P cycle through the scoped model set without opening the overlay./login.Extensions can register tools, slash commands, event hooks, flags, providers,
and shortcuts. See EXTENSIONS.md for the full architecture
and docs/extension-catalog.json for the
224-entry catalog with per-extension conformance status and perf budgets.
This project validates extension compatibility with a three-track pipeline:
target/release/pi execution against a non-mocked provider/model path.Fetch unvendored source corpus
ext_unvendored_fetch_runcargo run --bin ext_unvendored_fetch_run -- run-all --workers 8 --no-probe.tmp-codex-unvendored-cache/tests/ext_conformance/reports/pipeline/unvendored_fetch_probe_report.jsontests/ext_conformance/reports/pipeline/unvendored_fetch_probe_events.jsonlRun end-to-end validation orchestration
ext_full_validationcargo run --bin ext_full_validation --refresh_onboarding_queue (runs ext_onboarding_queue)conformance_shard_0..N (runs ext_conformance_generated sharded matrix)conformance_failure_dossiersprovider_compat_matrixscenario_conformance_suiteauto_repair_full_corpusdifferential_suite (optional, enabled via --run-diff; npm diff via --run-npm-diff)tests/ext_conformance/reports/pipeline/full_validation_report.jsontests/ext_conformance/reports/pipeline/full_validation_report.mdtests/ext_conformance/reports/**Run dev-firstset live-provider gate (must pass before release build)
ext_release_binary_e2ecargo build --bin pi --bin ext_release_binary_e2ePI_HTTP_REQUEST_TIMEOUT_SECS=0 target/debug/ext_release_binary_e2e --pi-bin target/debug/pi --provider ollama --model qwen2.5:0.5b --jobs 10 --timeout-secs 600 --max-cases 20 --extension-policy balanced --out-json tests/ext_conformance/reports/release_binary_e2e/ollama_firstset_dev_20260219_jobs10_timeout600.json --out-md tests/ext_conformance/reports/release_binary_e2e/ollama_firstset_dev_20260219_jobs10_timeout600.mdpass=20 / total=20 with fail=0.tests/ext_conformance/reports/release_binary_e2e/ollama_firstset_dev_20260219_jobs10_timeout600.jsontests/ext_conformance/reports/release_binary_e2e/ollama_firstset_dev_20260219_jobs10_timeout600.mdRun full release-binary live-provider E2E (after step 3 passes)
ext_release_binary_e2ecargo build --release --bin pi --bin ext_release_binary_e2ePI_HTTP_REQUEST_TIMEOUT_SECS=0 target/release/ext_release_binary_e2e --pi-bin target/release/pi --provider ollama --model qwen2.5:0.5b --jobs 10 --timeout-secs 600 --extension-policy balanced --out-json tests/ext_conformance/reports/release_binary_e2e/ollama_full_release_20260219_jobs10_timeout600.json --out-md tests/ext_conformance/reports/release_binary_e2e/ollama_full_release_20260219_jobs10_timeout600.mdtarget/release/pi directly for each selected extension case.ollama + qwen2.5:0.5b) to exercise non-mocked end-to-end behavior.pi.ext.release_binary_e2e.v1).tests/ext_conformance/reports/release_binary_e2e/ollama_full_release_20260219_jobs10_timeout600.jsontests/ext_conformance/reports/release_binary_e2e/ollama_full_release_20260219_jobs10_timeout600.mdtests/ext_conformance/reports/release_binary_e2e/cases/*Aggregate and triage
full_validation_report.json combines:
stageSummary, stageResults)corpus)conformance)providerCompat)scenario)reviewQueue, verdictCounts)not_tested_unvendored indicates unvendored candidates not yet in vendored conformance; this is inventory status, not a vendored regression.These runs compile many crates and can be disk-heavy. Point Cargo artifacts and temp files to a large volume:
export CARGO_TARGET_DIR="/data/tmp/pi_agent_rust/${USER:-agent}"
export TMPDIR="/data/tmp/pi_agent_rust/${USER:-agent}/tmp"
mkdir -p "$CARGO_TARGET_DIR" "$TMPDIR"
Then run:
cargo run --bin ext_unvendored_fetch_run -- run-all --workers 8 --no-probe
cargo run --bin ext_full_validation --
From:
tests/ext_conformance/reports/sharded/shard_0_report.json (generated 2026-02-18T23:43:48Z)
tests/ext_conformance/reports/scenario_conformance.json (generated 2026-02-18T23:11:57Z)
tests/ext_conformance/reports/parity/triage.json (generated 2026-02-18T23:12:13Z)
tests/ext_conformance/reports/release_binary_e2e/ollama_firstset_dev_20260219_jobs10_timeout600.json (run release-e2e-20260219T032439Z)
tests/ext_conformance/reports/release_binary_e2e/ollama_full_release_20260219_jobs10_timeout600.json (run release-e2e-20260219T033502Z)
Vendored matrix conformance: manifest_count=224, tested=224, passed=224, failed=0, skipped=0
Scenario suite conformance: 25/25 passed (0 fail, 0 error, 0 skip)
Differential parity triage sample: 22 match, 0 mismatch, 3 skip (total=25)
Dev first-set live-provider gate (max_cases=20, debug binaries): 20/20 passed (0 fail, 0 timeout)
Release-binary live-provider full run (optimized binaries, jobs=10, timeout=600s, ollama + qwen2.5:0.5b): 224/224 passed (0 fail, 0 timeout)
# Latest release
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | bash
# Non-interactive + auto PATH update
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | bash -s -- --yes --easy-mode
# Pin a release tag
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | bash -s -- --version v0.1.0
# Install from explicit artifact URL + checksum URL
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | \
bash -s -- \
--artifact-url "https://github.com/Dicklesworthstone/pi_agent_rust/releases/download/v0.1.0/pi-linux-amd64.tar.xz" \
--checksum-url "https://github.com/Dicklesworthstone/pi_agent_rust/releases/download/v0.1.0/SHA256SUMS"
# Skip completion setup (CI/non-interactive minimal install)
curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/install.sh?$(date +%s)" | \
bash -s -- --yes --no-completions
The installer is idempotent and supports a migration path from TypeScript Pi:
pi commandpilegacy-piNotable installer flags:
--offline [TARBALL]: enforce offline mode; optional local artifact path (.tar.gz, .tar.xz, .zip, or raw binary)--artifact-url: force a specific release artifact URL--checksum / --checksum-url: override checksum source for explicit artifacts--sigstore-bundle-url: override Sigstore bundle URL used by cosign verify-blob--completions auto|off|bash|zsh|fish: force shell completion install target (off is equivalent to --no-completions)--no-completions: disable completion installation--no-agent-skills: skip automatic installation of the pi-agent-rust skill into ~/.claude/skills/ and ~/.codex/skills/--no-verify: skip checksum + signature verification (testing only)--artifact-url without --version uses a synthetic tag for release mode only; if artifact download fails, install exits instead of attempting source fallbackHTTPS_PROXY / HTTP_PROXY for all network fetchesBy default, the installer also installs a pi-agent-rust skill for both Claude Code and Codex CLI:
~/.claude/skills/pi-agent-rust/SKILL.md~/.codex/skills/pi-agent-rust/SKILL.md (or $CODEX_HOME/skills/pi-agent-rust/SKILL.md if CODEX_HOME is set)Installer regression harness (options + checksum + signature + completions):
bash tests/installer_regression.sh
For migration adoption, packaging and invocation compatibility follows this contract:
This section covers packaging/invocation behavior only; functional parity and certification status are tracked in docs/dropin-certification-contract.json.
Canonical executable name is pi across release assets and installer-managed installs.
Installer-managed installs also create an rpi compatibility launcher when no conflicting rpi command already exists on your PATH.
Existing TypeScript pi installs can be migrated in place; the prior command is preserved as legacy-pi.
If you keep TypeScript pi as canonical (--keep-existing-pi), Rust Pi is installed as pi-rust.
On Apple Silicon, the installer prefers the native arm64 artifact even when launched from a Rosetta-translated shell.
Version-pinned installs are supported via install.sh --version vX.Y.Z for deterministic rollouts.
Every GitHub release ships platform binaries plus SHA256SUMS for integrity validation.
Representative smoke checks:
# Canonical command should exist and execute
command -v pi
pi --version
pi --help >/dev/null
# If a TS migration was performed, legacy command remains available
command -v legacy-pi && legacy-pi --version
Requires Rust nightly (2024 edition features):
# Install Rust nightly
rustup install nightly
rustup default nightly
# Clone and build
git clone https://github.com/Dicklesworthstone/pi_agent_rust.git
cd pi_agent_rust
cargo build --release
# Binary is at target/release/pi
./target/release/pi --version
# To install system-wide (--locked ensures reproducible dependency resolution)
cargo install --path . --locked
Pi has minimal runtime dependencies:
fd: Required for the find tool (install via apt install fd-find or brew install fd)rg: Required for the grep tool (install via apt install ripgrep or brew install ripgrep)curl -fsSL "https://raw.githubusercontent.com/Dicklesworthstone/pi_agent_rust/main/uninstall.sh" | bash
By default, uninstall removes installer-managed Rust binaries/aliases and skill directories,
then restores a migrated TypeScript pi if one was preserved.
pi [OPTIONS] [MESSAGE]...
# Examples
pi # Start interactive session
pi "Hello" # Start with message
pi @file.rs "Explain this" # Include file as context
pi -p "Quick question" # Print mode (no session)
Interactive file references:
@relative/path in the editor to attach a file’s contents (autocomplete inserts the @ form).| Option | Description |
|---|---|
-c, --continue |
Continue most recent session |
-r, --resume |
Open session picker UI |
--session <PATH> |
Open specific session file |
--session-dir <DIR> |
Override session storage directory for this run |
| `--session-durability strict | balanced |
--no-session |
Don't persist conversation |
-p, --print |
Single response, no interaction |
| `--mode text | json |
--provider <NAME> |
Force provider for this run (aliases supported) |
--model <MODEL> |
Model to use (auto-select fallback: anthropic/claude-opus-4-5, then openai/gpt-5.1-codex, then google/gemini-2.5-pro) |
--thinking <LEVEL> |
Thinking level: off/minimal/low/medium/high/xhigh |
--tools <TOOLS> |
Comma-separated tool list |
--api-key <KEY> |
API key (or use provider-specific env vars such as ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.) |
| `--extension-policy safe | balanced |
| `--repair-policy off | suggest |
--list-models [PATTERN] |
List available models (optional fuzzy filter) |
--list-providers |
List canonical provider IDs, aliases, and auth env keys |
--export <PATH> |
Export session file to HTML |
Additional high-leverage flags:
--no-migrations to skip startup migration checks--explain-extension-policy to print effective capability decisions and exit--explain-repair-policy to print effective repair-policy resolution and exit# Package management
pi install <source> [-l|--local] # Install a package source and add to settings
pi remove <source> [-l|--local] # Remove a package source from settings
pi update [source] # Update all (or one) non-pinned packages
pi list # List user + project packages from settings
# Configuration
pi config # Show settings paths + precedence
More utility subcommands:
# Extension catalog index + discovery
pi update-index
pi search "git"
pi info pi-search-agent
# Environment and extension diagnostics
pi doctor
pi doctor --only sessions --format json
pi doctor ./path/to/extension --policy safe --fix
# Session storage migration (JSONL -> v2 sidecar store)
pi migrate ~/.pi/agent/sessions --dry-run
pi migrate ~/.pi/agent/sessions
update-index refreshes extension index metadata used by search and info.search and info let you discover and inspect extension metadata without leaving the CLI.doctor checks config, directories, auth, shell setup, sessions, and extension compatibility.migrate validates or creates the v2 session sidecar format for faster resume on larger histories.Pi reads configuration from ~/.pi/agent/settings.json:
{
"default_provider": "anthropic",
"default_model": "claude-opus-4-5",
"default_thinking_level": "medium",
"compaction": {
"enabled": true,
"reserve_tokens": 8192,
"keep_recent_tokens": 20000
},
"retry": {
"enabled": true,
"max_retries": 3,
"base_delay_ms": 1000,
"max_delay_ms": 30000
},
"images": {
"auto_resize": true,
"block_images": false
},
"terminal": {
"show_images": true,
"clear_on_shrink": false
},
"shell_path": "/bin/bash",
"shell_command_prefix": "set -e"
}
Settings are resolved in priority order (first match wins):
--model, --thinking, --provider, etc.)ANTHROPIC_API_KEY, PI_CONFIG_PATH, etc.).pi/settings.json in the working directory)~/.pi/agent/settings.json)This means a CLI flag always overrides a settings.json value, and a project-level setting overrides the global one.
Skills, prompt templates, themes, and extensions follow the same resolution order:
--skill, --prompt-template, --theme, -e).pi/skills/, .pi/prompts/, .pi/themes/, .pi/extensions/)~/.pi/agent/skills/, ~/.pi/agent/prompts/, etc.)~/.pi/agent/packages/)When multiple resources share the same name, the first occurrence wins. Collisions are logged as diagnostics.
Prompt template expansion supports positional arguments: $1, $2, $@ (all args), and slice syntax ${@:start}, ${@:start:length}. For example, a template invoked as /review src/main.rs --strict receives src/main.rs as $1 and --strict as $2.
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY |
Anthropic API key |
OPENAI_API_KEY |
OpenAI API key |
GOOGLE_API_KEY |
Google Gemini API key |
AZURE_OPENAI_API_KEY |
Azure OpenAI API key |
COHERE_API_KEY |
Cohere API key |
GROQ_API_KEY |
Groq API key (OpenAI-compatible) |
DEEPINFRA_API_KEY |
DeepInfra API key (OpenAI-compatible) |
CEREBRAS_API_KEY |
Cerebras API key (OpenAI-compatible) |
OPENROUTER_API_KEY |
OpenRouter API key (OpenAI-compatible) |
MISTRAL_API_KEY |
Mistral API key (OpenAI-compatible) |
MOONSHOT_API_KEY |
Moonshot/Kimi API key (OpenAI-compatible) |
DASHSCOPE_API_KEY |
DashScope/Qwen API key (OpenAI-compatible) |
DEEPSEEK_API_KEY |
DeepSeek API key (OpenAI-compatible) |
FIREWORKS_API_KEY |
Fireworks API key (OpenAI-compatible) |
TOGETHER_API_KEY |
Together API key (OpenAI-compatible) |
PERPLEXITY_API_KEY |
Perplexity API key (OpenAI-compatible) |
XAI_API_KEY |
xAI API key (OpenAI-compatible) |
PI_CONFIG_PATH |
Custom config file path |
PI_CODING_AGENT_DIR |
Override the global config directory |
PI_PACKAGE_DIR |
Override the packages directory |
PI_SESSIONS_DIR |
Custom sessions directory |
┌─────────────────────────────────────────────────────────────────┐
│ CLI (clap) │
│ • Argument parsing • @file expansion • Subcommands │
└─────────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────────▼───────────────────────────────┐
│ Agent Loop │
│ • Message history • Tool iteration • Event callbacks │
└────────┬──────────────────────┬──────────────────────┬──────────┘
│ │ │
┌────────▼────────┐ ┌─────────▼──────────┐ ┌───────▼──────────┐
│ Provider Layer │ │ Tool Registry │ │ Extension Mgr │
│ • Anthropic │ │ • read • bash │ │ • QuickJS JS/TS │
│ • OpenAI (Chat/ │ │ • write • grep │ │ • Native descriptor│
│ Responses) │ │ • edit • find │ │ runtime │
│ • Gemini/Cohere │ │ • ls │ │ • Capability policy│
│ • Azure/Bedrock │ │ • ext-registered │ │ • Node shims │
│ • Vertex/Copilot│ │ │ │ • Event hooks │
│ • GitLab/Ext │ │ │ │ • Runtime risk ctl │
└────────┬────────┘ └─────────┬──────────┘ └───────┬──────────┘
│ │ │
┌────────▼─────────────────────▼──────────────────────▼──────────┐
│ Session Persistence │
│ • JSONL format (v3) • Tree structure • Session index/cache │
│ • Per-project dirs • Optional SQLite backend │
└─────────────────────────────────────────────────────────────────┘
Current native providers in src/providers/ are anthropic, openai, openai_responses, gemini, cohere, azure, bedrock, vertex, copilot, and gitlab, with extension-provided streamSimple providers routed through the same agent loop.
#![forbid(unsafe_code)] enforced project-widesysinfo crate ensures no orphaned processesthiserror with specific error types per componentopt-level = 3 for runtime performanceThis Rust port preserves Pi's user experience, but intentionally changes the runtime substrate. The original TypeScript Pi (pi-mono, packages/coding-agent) is built on Node.js + package-level abstractions. pi_agent_rust moves those same behaviors onto asupersync primitives so lifecycle guarantees are explicit in the runtime model.
| Concern | TypeScript Pi (pi-mono baseline) | pi_agent_rust + asupersync |
|---|---|---|
| Runtime model | Node event loop + Promise/AbortSignal conventions | RuntimeBuilder + explicit reactor and runtime handle |
| Async ownership | Task lifetimes coordinated by framework/library code | Structured task ownership and explicit cross-thread channels (TUI/RPC bridging) |
| Cancellation semantics | Primarily API- and tool-layer conventions | Runtime-aware cancellation checks + bounded timeout handling in tools |
| I/O capability shape | Ambient Node APIs + extension layer policies | Capability-scoped context (AgentCx over asupersync::Cx) and explicit hostcall policy |
| HTTP streaming | Provider/client dependent | Purpose-built asupersync HTTP/TLS client feeding custom SSE parser |
| Deterministic test hooks | Conventional async test setup | asupersync test/runtime hooks used widely in unit/integration tests |
Why this is useful in practice:
These are the concrete invariants we rely on in this implementation:
Turn-scoped agent lifecycle
AgentStart, TurnStart, TurnEnd, and AgentEnd in a stable order.max_tool_iterations (default 50) to avoid unbounded self-tool loops.Abort and timeout behavior is explicit
bash timeout follows a clear escalation path: terminate process tree, grace period, then hard kill.Session writes are crash-resilient
Compaction is threshold-driven and boundary-aware
context_window - reserve_tokens.Capability policy is fail-closed and precedence-defined
Streaming parser tolerates real network chunking
data: fields, partial UTF-8 tails, and end-of-stream flush.The following asupersync principles are reflected directly in pi_agent_rust architecture:
AgentCx wraps asupersync::Cx at subsystem boundaries (agent/tools/session/rpc).Compared to the original TypeScript implementation, this shifts more correctness responsibility into the runtime and core algorithms themselves, instead of relying primarily on ecosystem conventions.
This is a second comparison pass focused on high-impact architectural deltas and rationale.
| Area | Original pi-mono (packages/coding-agent) |
pi_agent_rust |
Why this divergence exists |
|---|---|---|---|
| Distribution model | npm package (npm install -g @mariozechner/pi-coding-agent) |
Single Rust binary (pi) |
Remove Node runtime dependency and improve startup/deployment portability |
| Execution surfaces | Interactive + print + JSON mode + RPC + SDK | Interactive + print + JSON mode + RPC + Rust SDK | Rust SDK provides idiomatic companion API for embedding Pi programmatically (documented in docs/sdk.md) |
| Default built-in tool posture | Defaults to read/write/edit/bash (others available) |
Seven built-ins treated as first-class (read/write/edit/bash/grep/find/ls) |
Keep common code-navigation and shell workflows available without extra configuration |
| Extension trust model | Extension/package model documented as full system access | Embedded runtime with capability-gated hostcalls and policy profiles | Reduce ambient authority and make extension behavior auditable/deny-by-default |
| Session architecture emphasis | JSONL tree session model and branch navigation | JSONL v3 tree + explicit session index (SQLite sidecar) + optional SQLite session backend | Faster resume/lookups at scale and safer multi-instance coordination |
| Streaming transport stack | Node runtime networking stack | Purpose-built HTTP/TLS client + custom SSE parser on asupersync | Tighter control over chunking, parsing, and failure handling in long streams |
| Cancellation/timeout mechanics | Platform/event-loop cancellation conventions | Explicit abort signaling, bounded tool iterations, process-tree termination | Minimize hangs/orphans and make stop behavior deterministic under load |
| Runtime context model | Framework-level conventions and extension APIs | Explicit AgentCx/asupersync::Cx capability-scoped context threading |
Make effect boundaries and testability first-class architectural constraints |
Practical consequence of these deltas:
docs/parity-certification.json tracks functional parity progress and certification status.This section compares concrete implementation mechanics for equivalent high-level behavior.
| Algorithm | pi-mono baseline mechanism | Rust implementation mechanism | Why the Rust variant exists |
|---|---|---|---|
| Session context rebuild after compaction | buildSessionContext() emits compaction summary, then messages from firstKeptEntryId (pre-compaction path), then post-compaction entries |
to_messages_for_current_path() uses the same ordering and adds a fallback if first_kept_entry_id is missing |
Avoid silent context loss when compaction anchors are orphaned/corrupted |
| JSONL persistence | Incremental append (appendFileSync) plus full rewrite (writeFileSync) for migrations/rewrites |
Save via temp file + atomic persist/replace | Keep on-disk session state crash-resilient during save operations |
| Session discovery/resume | Directory/file scan and mtime sorting of JSONL files | SQLite session index sidecar + WAL + lock file + staleness-triggered full reindex | Bound resume lookup cost and coordinate concurrent processes |
| Compaction token accounting | Uses assistant usage (totalTokens else input+output+cacheRead+cacheWrite) plus heuristic trailing estimates |
Uses assistant usage (total_tokens else input+output) plus heuristic trailing estimates; fixed image token estimate |
Keep accounting stable across providers with uneven cache-token reporting while staying conservative |
| Cut-point + split-turn handling | Valid cut points exclude tool results; split turns are summarized as history + turn-prefix context | Same cut-point class and split-turn strategy, implemented in Rust entry/message model | Preserve tool-call/result adjacency and turn coherence under budget pressure |
| Bash timeout/process cleanup | Timeout/abort kills process tree (killProcessTree) and returns tail-truncated output |
Timeout escalation (TERM then grace then KILL) + process-tree walk + shell exit trap + tail truncation |
Enforce bounded cleanup and reduce descendant-process leaks from background jobs |
| Streaming event decoding | Transport semantics are exposed (sse/websocket/auto); parser details are runtime-internal |
Explicit SSE parser with BOM stripping, CR/LF normalization, UTF-8 tail buffering, and flush-on-end | Make byte-to-event behavior deterministic and provider-SDK-independent |
The sections above compare mechanics. This section calls out concrete features present in this Rust port that are not part of the pi-mono baseline implementation model.
| Rust-port feature | Why it is useful/compelling |
|---|---|
pi doctor diagnostics command (text/json/markdown, --only, --fix, extension compatibility checks) |
Gives actionable environment + compatibility diagnostics, supports CI gating (non-zero on failures), and can auto-fix safe issues like missing dirs/permissions |
Capability-gated extension policy profiles (safe / balanced / permissive) with per-extension overrides |
Lets operators run shared extensions with explicit capability boundaries instead of ambient full-system access |
Secret-aware extension env filtering (pi.env() blocklist for keys/tokens/secrets) |
Reduces accidental credential exposure from extension code paths |
Per-extension trust lifecycle + kill-switch audit trail (pending/acknowledged/trusted/killed, kill_switch, lift_kill_switch) |
Supports immediate containment, explicit operator provenance, and controlled re-entry after review |
| Hostcall compatibility-lane emergency controls (global/per-extension forced-compat switches + reason codes) | Gives operators a deterministic rollback path for fast-lane incidents without losing extension availability |
| Runtime risk controller for extension hostcalls (configurable, fail-closed by default) | Adds another enforcement layer beyond static policy for suspicious runtime behavior in extension call flows |
Argument-aware runtime risk scoring for shell paths (dcg_rule_hit, dcg_heredoc_hit, heredoc AST inspection across Bash/Python/JS/TS/Ruby) |
Detects destructive intent hidden in multiline scripts and wrapper commands before hostcall execution |
| Tamper-evident runtime risk ledger tooling (`ext_runtime_risk_ledger verify | replay |
| Unified incident evidence bundle export (risk ledger, security alerts, hostcall telemetry, exec mediation, secret-broker events) | Incident response can triage from one structured artifact set instead of stitching ad-hoc logs |
| Deterministic hostcall reactor mesh with optional NUMA slab pool (shard affinity, global-order drain, bounded SPSC lanes, telemetry) | Keeps extension dispatch predictable under load and surfaces queue/backpressure behavior for tuning |
| Warm isolate pool + startup prewarm handoff | Moves JS runtime preparation off the first interactive turn and reuses warmed state safely between runs |
| Extension preflight static analysis (imports/forbidden-pattern scan with policy-aware hints) | Catches risky extension patterns before runtime execution |
| Node/Bun-compatible extension runtime without Node/Bun dependency (embedded QuickJS + shims) | Runs legacy extension workflows in a single native binary deployment model |
| Extension compatibility scanner + conformance harness | Makes extension support measurable and auditable instead of anecdotal |
| SQLite session index sidecar (WAL + lock + stale reindex path) | Gives fast session resume/list operations at scale without scanning every JSONL file on each query |
| Session Store V2 rollback and migration ledger (segmented log + checkpoints + rollback events) | Long-session recovery can unwind to a known checkpoint with explicit migration/rollback provenance |
Optional SQLite session storage backend (sqlite-sessions feature) |
Supports deployments that want database-backed session persistence in addition to JSONL |
| Crash-resilient session save path (temp file + atomic persist) | Improves session-file durability during writes and reduces partial-write failure modes |
Unified hostcall dispatcher with typed taxonomy mapping (timeout / denied / io / invalid_request / internal) |
Produces consistent extension/runtime error semantics and easier client handling |
Fail-closed evidence-lineage gates (run_id/correlation_id + cross-artifact lineage checks) |
Rejects stale or cherry-picked conformance/perf artifacts at release-gate time |
| Structured auth diagnostics with stable machine codes | Improves troubleshooting and operational visibility without leaking sensitive credential material |
Pi deliberately uses advanced math where it improves runtime behavior or benchmark confidence. The goal is not “fancy formulas in docs”; it is safer policy decisions, faster recovery from workload shifts, and more trustworthy performance attribution.
In the extension dispatcher, Pi combines CUSUM and Bayesian online change-point detection to detect load-regime changes early (for example when hostcall traffic suddenly spikes or stalls).
$$
S_t^+ = \max\left(0,;S_{t-1}^+ + (-z_t - k)\right), \quad
S_t^- = \max\left(0,;S_{t-1}^- + (z_t - k)\right)
$$
$$
H(r)=\frac{1}{\lambda}, \quad
P(r_t=0 \mid x_{1:t}) \propto \sum_r P(r_{t-1}=r),H(r),P(x_t \mid r)
$$
Intuition: CUSUM catches persistent drift; BOCPD catches sudden regime changes without brittle fixed thresholds.
Pi tracks nonconformity scores (absolute residuals from the running mean) and treats out-of-interval events as anomalies.
$$
q = \text{score}_{\lceil (n+1)\cdot \text{confidence} \rceil - 1}, \quad
\text{anomaly if } |x_t - \mu_t| > q
$$
Intuition: thresholds adapt from recent behavior instead of hard-coding one static latency cutoff.
Pi’s safety envelope includes a PAC-Bayes-kl bound over extension outcomes, and can veto aggressive optimization when the bound is too high.
$$
\mathrm{kl}(\hat q ,|, q_{\text{bound}});\le;\frac{\mathrm{KL}(Q|P)+\ln!\left(2\sqrt{n}/\delta\right)}{n}
$$
Intuition: this gives an explicit uncertainty-aware ceiling on true error risk before allowing more aggressive runtime behavior.
Before approving policy moves, Pi evaluates candidate behavior from trace data:
$$
w_i=\frac{\pi(a_i\mid x_i)}{\mu(a_i\mid x_i)}, \quad
\hat V_{\text{IPS}}=\frac{1}{n}\sum_i w_i r_i
$$
$$
\hat V_{\text{WIS}}=\frac{\sum_i w_i r_i}{\sum_i w_i}, \quad
\hat V_{\text{DR}}=\frac{1}{n}\sum_i\left(\hat r_i + w_i(r_i-\hat r_i)\right)
$$
$$
N_{\text{eff}}=\frac{(\sum_i w_i)^2}{\sum_i w_i^2}, \quad
\Delta_{\text{regret}}=\bar r_{\text{baseline}}-\hat V_{\text{DR}}
$$
Intuition: Pi fails closed if sample support is weak, uncertainty is high, or estimated regret is above threshold.
The VOI planner prioritizes probes that provide the most expected learning under a strict overhead budget.
$$
\text{priority}_i \propto \frac{\text{utility}_i}{\text{overhead}_i}
$$
Intuition: run only the experiments that are likely to change decisions; skip stale or low-value probes.
For phase-1 matrix benchmarking, Pi computes stage attribution weighted by realistic workload size (session_messages) and reports confidence intervals.
\frac{\sum_i w_i,m_{i,s}}{\sum_i w_i,t_i}\cdot 100,
\quad w_i=\text{session_messages}_i
$$
$$
n_{\text{eff}}=\frac{(\sum_i w_i)^2}{\sum_i w_i^2}, \quad
\mathrm{CI}{95}=\mu \pm 1.96\sqrt{\frac{\sigma^2}{n{\text{eff}}}}
$$
Intuition: prioritize what dominates real end-to-end latency, not just isolated microbench hotspots.
Pi also includes an online tuner path for batch/time-slice controls with explicit rollback behavior:
\mathrm{clip}!\left(\tau_t - \eta\nabla_{\tau}\mathcal{L}t,;\tau{\min},\tau_{\max}\right)
$$
Intuition: the system adapts continuously, but if instantaneous loss exceeds a rollback threshold it immediately returns to a safer profile.
| Technique | Where in Pi | Why it helps |
|---|---|---|
| CUSUM + BOCPD | Extension dispatcher regime detector | Detects traffic regime shifts early and robustly |
| Conformal intervals | Safety envelope | Adaptive anomaly gating without static magic numbers |
| PAC-Bayes bound | Safety envelope veto path | Fails closed when uncertainty/risk is too high |
| IPS/WIS/DR + ESS | Off-policy evaluator | Approves policy changes only with adequate support |
| VOI planning | Experiment scheduler | Uses overhead budget on highest-value probes |
| Weighted attribution + CI | Phase-1 perf matrix reports | Ranks optimization work by realistic user impact |
| OCO + regret rollback | Runtime controller | Adapts under load while bounding unsafe drift |
The SSE (Server-Sent Events) parser is a custom implementation that handles Anthropic's streaming response format. Unlike library-based approaches, the parser operates as a state machine that processes bytes incrementally:
Bytes → Line Accumulator → Event Parser → Typed StreamEvent
Key characteristics:
| Property | Implementation |
|---|---|
| Buffering | Zero-copy where possible; lines accumulated only when incomplete |
| Event types | 12 distinct variants: MessageStart, ContentBlockStart, ContentBlockDelta, ContentBlockStop, MessageDelta, MessageStop, Ping, Error, and thinking-specific events |
| Error recovery | Malformed events logged but don't crash the stream |
| Memory | Fixed-size rolling buffer prevents unbounded growth |
The parser handles edge cases like:
data: fields (concatenated with newlines)event: field appearing before or after data:Large outputs from tools (file reads, command output, grep results) must be truncated to avoid exhausting the LLM's context window. The truncation algorithm preserves usefulness while staying within limits:
┌─────────────────────────────────────────┐
│ Original Content │
│ (potentially huge) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ HEAD: First N/2 lines │
│ ───────────────────────── │
│ [... X lines truncated ...] │
│ ───────────────────────── │
│ TAIL: Last N/2 lines │
└─────────────────────────────────────────┘
Constants:
| Limit | Value | Rationale |
|---|---|---|
MAX_LINES |
2000 | Balances context usage vs. completeness |
MAX_BYTES |
1MB | Prevents binary file accidents |
GREP_MAX_LINE_LENGTH |
500 chars | Truncates minified code |
The algorithm:
MAX_LINES, takes first 1000 and last 1000MAX_BYTES, applies byte-level truncationThe bash tool must handle runaway processes, infinite loops, and fork bombs without leaving orphans. The implementation uses the sysinfo crate to walk the process tree:
// Pseudocode for process cleanup
fn kill_process_tree(root_pid: Pid) {
let system = System::new();
let children = find_all_descendants(root_pid, &system);
// Kill children first (deepest first), then parent
for child in children.iter().rev() {
kill(child, SIGKILL);
}
kill(root_pid, SIGKILL);
}
Timeout behavior:
To avoid orphaned background jobs (e.g. cmd &), the bash script installs an EXIT trap that waits for any remaining child processes and then exits with the original command's status.
This prevents the common failure mode where killing a shell leaves its children running.
Sessions use a tree structure rather than a flat list, enabling conversation branching (useful when exploring different approaches):
┌─────────┐
│ Message │ (root)
│ #1 │
└────┬────┘
│
┌────▼────┐
│ Message │
│ #2 │
└────┬────┘
│
┌──────────┼──────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ Message │ │ Message │ (branch)
│ #3 │ │ #3b │
└────┬────┘ └────┬────┘
│ │
┌────▼────┐ ┌────▼────┐
│ Message │ │ Message │
│ #4 │ │ #4b │
└─────────┘ └─────────┘
JSONL format (v3):
Each line is a self-contained JSON object with a type discriminator:
{"type":"session","version":3,"cwd":"/project","created":"2024-01-15T10:30:00Z"}
{"type":"message","id":"a1b2c3d4","parent":"root","role":"user","content":[...]}
{"type":"message","id":"e5f6g7h8","parent":"a1b2c3d4","role":"assistant","content":[...]}
{"type":"model_change","id":"i9j0k1l2","parent":"e5f6g7h8","model":"claude-sonnet-4-20250514"}
The parent field creates the tree. Replaying a session walks the tree from root to the current leaf. Branching creates a new message with a different parent than the previous continuation.
The Provider trait abstracts over different LLM backends:
#[async_trait]
pub trait Provider: Send + Sync {
fn name(&self) -> &str;
fn models(&self) -> &[Model];
async fn stream(
&self,
context: &Context,
options: &StreamOptions,
) -> Result<impl Stream<Item = Result<StreamEvent>>>;
}
Context structure:
pub struct Context {
pub system: Option<String>, // System prompt
pub messages: Vec<Message>, // Conversation history
pub tools: Vec<ToolDef>, // Available tools with JSON schemas
}
StreamOptions:
pub struct StreamOptions {
pub model: String,
pub max_tokens: u32,
pub temperature: Option<f32>,
pub thinking: Option<ThinkingConfig>, // Extended thinking settings
pub stop_sequences: Vec<String>,
}
This design allows adding new providers (OpenAI, Gemini) without modifying the agent loop. Each provider translates the common types to its wire format and emits a unified StreamEvent stream.
Long conversations eventually exceed the model's context window. Pi's compaction algorithm reclaims space by summarizing older messages while preserving recent context.
The algorithm runs automatically after each agent turn when estimated token usage exceeds context_window - reserve_tokens:
┌──────────────────────────────────────────────────────────────┐
│ Full Conversation │
│ msg1 → msg2 → msg3 → ... → msgN-5 → msgN-4 → ... → msgN │
│ ├──── older messages ─────┤ ├─── recent messages ──────────┤ │
│ │
│ Step 1: Find cut point at a valid turn boundary │
│ Step 2: LLM summarizes msgs 1..N-5 into compact paragraph │
│ Step 3: Store Compaction entry in session JSONL │
│ Step 4: Next agent call uses [summary] + msgs N-4..N │
└──────────────────────────────────────────────────────────────┘
Token estimation uses a conservative chars ÷ 4 heuristic for text and a flat 1,200 tokens per image. When an assistant message includes a usage field from the API, that measured value takes precedence over the heuristic.
Cut point selection prefers boundaries between complete user-assistant turns. If the budget forces a mid-turn cut, the algorithm includes prefix messages from the split turn so the model retains context about what was being discussed at the boundary.
File operation tracking extracts read, write, and edit tool calls from the messages being summarized. The compaction prompt includes these paths so the summary preserves awareness of which files were examined or modified:
<read-files>
src/main.rs
src/config.rs
</read-files>
<modified-files>
src/auth.rs
</modified-files>
Configurable parameters:
| Parameter | Default | Purpose |
|---|---|---|
reserve_tokens |
8% of context window | Safety margin for response generation |
keep_recent_tokens |
10% of context window | Minimum recent context preserved |
Compaction can also be triggered manually with /compact in interactive mode or the compact RPC command.
Pi routes model requests through a provider factory that resolves the correct backend implementation from a (provider, model, api) tuple.
Resolution flow:
User specifies --provider openai --model gpt-4o
│
▼
┌──────────────────────────┐
│ Provider Metadata Table │ Maps "openai" → canonical ID,
│ │ determines API type (Completions
│ │ vs Responses vs custom)
└────────────┬──────────────┘
│
┌────────────▼──────────────┐
│ URL Normalization │ Appends /chat/completions,
│ │ /responses, or /chat depending
│ │ on detected API type
└────────────┬──────────────┘
│
┌────────────▼──────────────┐
│ Compat Config │ Applies per-model overrides:
│ │ system_role_name, max_tokens
│ │ field name, feature flags
└────────────┬──────────────┘
│
┌────────────▼──────────────┐
│ Provider Instance │ Anthropic | OpenAI | Gemini
│ │ Cohere | Azure | Bedrock | ...
└───────────────────────────┘
models.json overrides: Users can define custom providers in ~/.pi/agent/models.json or .pi/models.json. Each entry specifies a model ID, base URL, API type, and optional compat flags, letting you route to self-hosted models, proxies, or providers that Pi does not natively support.
Compat config handles the differences between OpenAI-compatible APIs:
| Override | Example | Purpose |
|---|---|---|
system_role_name |
"developer" |
o1 models use "developer" instead of "system" |
max_tokens_field |
"max_completion_tokens" |
Some models require a different field name |
supports_tools |
false |
Suppress tool definitions for models that reject them |
supports_streaming |
false |
Fall back to non-streaming for incompatible endpoints |
custom_headers |
{"X-Custom": "val"} |
Per-provider header injection |
Fuzzy matching: When a provider name doesn't match any known provider, Pi computes edit distance against all registered names and suggests the closest match in the error message.
Extensions run in an embedded QuickJS runtime (rquickjs crate) and communicate with Pi through a structured hostcall protocol. This is the mechanism that lets JavaScript code invoke Pi's built-in tools, make HTTP requests, and interact with the session, all without direct OS access.
Execution model:
┌─────────────────── QuickJS VM ───────────────────┐
│ │
│ extension.js calls: │
│ pi.tool("read", {path: "src/main.rs"}) │
│ │ │
│ ▼ │
│ enqueue HostcallRequest { │
│ call_id: "hc-0042", │
│ kind: Tool { name: "read" }, │
│ payload: { path: "src/main.rs" }, │
│ } │
│ │ │
│ ▼ │
│ return Promise (resolve/reject stored in map) │
│ │
└────────────────────────┬──────────────────────────┘
│
drain_hostcall_requests()
│
▼
┌─────────────── ExtensionDispatcher ──────────────┐
│ │
│ 1. Check capability policy: │
│ read tool → requires "read" capability │
│ → Policy says: Allow / Deny / Prompt │
│ │
│ 2. If allowed → dispatch to ToolRegistry │
│ → Execute read tool │
│ → Get ToolOutput │
│ │
│ 3. complete_hostcall("hc-0042", Ok(result)) │
│ → Resolves the Promise in QuickJS │
│ │
│ 4. runtime.tick() │
│ → Drains Promise .then() chains │
│ → Extension JS continues execution │
│ │
└───────────────────────────────────────────────────┘
Capability mapping: Each hostcall kind maps to a required capability:
| Hostcall | Required Capability | Dangerous? |
|---|---|---|
pi.tool("read", ...) |
read |
No |
pi.tool("write", ...) |
write |
No |
pi.tool("bash", ...) |
exec |
Yes |
pi.http(request) |
http |
No |
pi.exec(cmd, args) |
exec |
Yes |
pi.env(key) |
env |
Yes |
pi.session(op, ...) |
session |
No |
pi.ui(op, ...) |
ui |
No |
pi.log(entry) |
log |
No (always allowed) |
Deduplication: Each hostcall's parameters are canonicalized (object keys sorted, structure normalized) and SHA-256 hashed. Identical requests within a short window can be deduplicated to avoid redundant tool executions.
Fast lane vs compatibility lane: Pi has two execution lanes for hostcalls:
tool and session operations). This avoids extra allocation and parsing work.For observability, each call is tagged with a stable lane key (for example tool|tool.read|filesystem or tool|fallback|filesystem) so latency and failure trends can be grouped consistently.
Built-in consistency guard (shadow dual execution): Pi can sample a small subset of read-only hostcalls, execute them through both lanes, and compare canonical output fingerprints. If divergence crosses a configured budget, Pi automatically backs off the fast lane for a period. This gives performance wins without silently changing behavior.
Adaptive dispatch mode under load: Pi can switch between:
sequential_fast_path for simpler/low-contention workloadsinterleaved_batching when contention and queue pressure riseMode changes are gated by sample coverage and risk checks, so Pi does not switch based on thin or cherry-picked evidence.
Runtime telemetry for debugging and tuning: Pi records structured hostcall telemetry (pi.ext.hostcall_telemetry.v1) with lane choice, fallback reason, dispatch latency share, marshalling path, and optimization hit/miss fields. This is used by perf reports and reliability diagnostics.
Auto-repair pipeline: When an extension fails to load or produces runtime errors, Pi's repair system can automatically fix common issues:
| Repair Mode | Behavior |
|---|---|
Off |
No repairs |
Suggest |
Log suggestions, don't apply |
AutoSafe (default) |
Apply provably safe fixes (missing file paths, asset references) |
AutoStrict |
Apply aggressive heuristic fixes (pattern-based transforms) |
Compatibility scanner: Before loading, Pi statically analyzes extension source code for imports, require() calls, and forbidden patterns (eval, Function(), process.binding, dlopen). The scan produces a capability evidence ledger that informs policy decisions.
Environment variable filtering: Extensions calling pi.env() hit a blocklist that denies access to API keys, credentials, tokens, and private keys. The filter blocks exact matches (ANTHROPIC_API_KEY, AWS_SECRET_ACCESS_KEY), suffix patterns (*_API_KEY, *_SECRET, *_TOKEN), and prefix patterns (AWS_SECRET_*, AWS_SESSION_*). Only PI_* variables are unconditionally allowed.
Trust lifecycle and kill switch: Extension trust state is tracked explicitly (pending, acknowledged, trusted, killed). A kill switch demotes an extension to killed, quarantines it in the runtime risk controller, emits a critical alert, and writes an audit record. Lifting the switch requires an explicit operator action and moves the extension back to acknowledged.
The extension runtime includes a few small decision engines so behavior stays stable as workload patterns change:
These pieces are intentionally conservative: if confidence is weak, Pi holds steady instead of making an aggressive switch.
The interactive mode uses the Elm Architecture (Model-Update-View) via the charmed_rust library family, which is a Rust port of Go's Bubble Tea framework.
Component stack:
┌────────────────────────────────────────────────────┐
│ Terminal (crossterm) │
│ Raw mode │ Alt screen │ Keyboard/Mouse events │
└──────────────────────┬─────────────────────────────┘
│
┌──────────────────────▼─────────────────────────────┐
│ bubbletea Program Loop │
│ Init() → Update(Msg) → View() → render cycle │
└──────────────────────┬─────────────────────────────┘
│
┌──────────────────────▼─────────────────────────────┐
│ PiApp (Model) │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ TextArea │ │ Viewport │ │ Spinner │ │
│ │ (editor) │ │ (convo) │ │ (status) │ │
│ └─────────────┘ └──────────────┘ └─────────────┘ │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Overlay Stack │ │
│ │ Model Selector │ Session Picker │ /tree │ │
│ │ Settings UI │ Theme Picker │ Branches │ │
│ │ Capability Prompt (extension UI) │ │
│ └─────────────────────────────────────────────┘ │
└──────────────────────┬─────────────────────────────┘
│
async channels (mpsc)
│
┌──────────────────────▼─────────────────────────────┐
│ Agent Async Task │
│ Runs on asupersync runtime │
│ Streams provider responses │
│ Executes tools │
│ Sends PiMsg events back to TUI thread │
└────────────────────────────────────────────────────┘
The async/sync bridge: The agent runs on the asupersync async runtime in a separate thread. It communicates with the bubbletea UI thread through mpsc channels. Each streaming event (text delta, tool start, tool update, agent done) becomes a PiMsg variant delivered to PiApp::update(), keeping the UI responsive during API streaming and tool execution.
Viewport scrolling: The conversation viewport tracks whether the user is at the bottom. When new content arrives and the user hasn't scrolled up, the viewport auto-follows the stream tail. Scrolling up disables auto-follow; pressing End or typing a new message re-enables it.
Overlay system: Modal UIs (model selector, session picker, branch navigator, extension capability prompts) stack on top of the main conversation view. Each overlay captures keyboard input until dismissed. Only the topmost active overlay receives events.
Slash commands available in the interactive editor:
| Command | Action |
|---|---|
/help |
Show available commands and keybindings |
/model or Ctrl+L |
Open model selector with fuzzy search |
Ctrl+P / Ctrl+Shift+P |
Cycle scoped models forward/backward |
/tree |
Browse and fork the conversation tree |
/clear |
Clear conversation and start fresh |
/compact |
Trigger manual compaction |
/thinking <level> |
Change thinking level mid-conversation |
/share |
Export session to GitHub Gist |
/exit or Ctrl+C |
Exit Pi |
The RPC mode (pi --mode rpc) exposes a line-delimited JSON protocol over stdin/stdout for programmatic integration. Each line is a self-contained JSON object.
Client → Pi (stdin):
{"type": "prompt", "message": "Explain this function", "id": "req-001"}
{"type": "steer", "message": "Focus on error handling"}
{"type": "follow_up", "message": "Now add tests"}
{"type": "abort"}
{"type": "get_state"}
{"type": "compact", "reserveTokens": 8192, "keepRecentTokens": 20000}
Pi → Client (stdout):
{"type": "agent_start", "sessionId": "..."}
{"type": "message_update", "message": {...}, "assistantMessageEvent": {"type": "text_delta", "delta": "The function", "contentIndex": 0}}
{"type": "tool_execution_start", "toolCallId": "...", "toolName": "read", "args": {}}
{"type": "tool_execution_end", "toolCallId": "...", "toolName": "read", "result": {}, "isError": false}
{"type": "agent_end", "sessionId": "...", "messages": [...]}
{"type": "response", "id": "req-001", "command": "prompt", "success": true, "data": {"status": "ok"}}
I/O architecture: Two dedicated threads handle stdin reading and stdout writing, bridged to the async agent runtime via channels. The stdin thread retries on transient errors to prevent dropped input. The stdout thread flushes after every line to prevent buffering delays.
Message queuing: While the agent is streaming a response, incoming messages are routed to one of two queues:
| Queue | Behavior | Use Case |
|---|---|---|
| Steering | Interrupts current response; processed on next turn | Course corrections |
| Follow-up | Queued until current response completes | Sequential instructions |
Queue modes (All or OneAtATime) control whether multiple queued messages are batched into a single turn or processed individually.
Extension UI over RPC: When an extension requests user input (capability prompt, selection dialog), Pi emits an extension_ui_request event. The client renders the prompt in its own UI and responds with an extension_ui_response message. IDE extensions can then present native UI for capability decisions instead of falling back to terminal prompts.
Session resume (pi -c or pi -r) needs to find the most recent session for the current project without scanning every JSONL file on disk. Pi maintains a SQLite index (session-index.sqlite) that provides constant-time lookups.
Schema:
CREATE TABLE sessions (
path TEXT PRIMARY KEY,
id TEXT NOT NULL,
cwd TEXT NOT NULL,
timestamp TEXT NOT NULL,
message_count INTEGER NOT NULL,
last_modified INTEGER NOT NULL,
size_bytes INTEGER NOT NULL,
name TEXT
);
Update lifecycle:
pi -c queries WHERE cwd = ? ORDER BY last_modified DESC LIMIT 1pi -r queries the same table and presents a picker sorted by recencyConcurrency: A file-based lock (session-index.lock) serializes writes from concurrent Pi instances. Reads use WAL mode for non-blocking access.
Staleness-based reindexing: If the index is older than a configurable threshold, Pi runs a full re-scan of the sessions directory to catch files created by other instances or manual edits. The re-scan keeps the index accurate without a centralized daemon.
Pi also supports a v2 sidecar store next to JSONL sessions for faster resume and stronger corruption checks on long histories.
What it adds:
How resume works:
Integrity strategy:
CLI support:
pi migrate <path> --dry-run validates migration without writing.pi migrate <path> performs JSONL-to-v2 migration and verifies parity.Beyond simple API keys, Pi supports OAuth, AWS credential chains, service key exchange, and bearer-token auth. Credentials are stored in ~/.pi/agent/auth.json with file-locked access to prevent corruption from concurrent instances. Stored API keys can be literal strings, $ENV:VAR_NAME references, or $CMD:shell command / $COMMAND:shell command sources that resolve trimmed stdout at request time.
| Mechanism | Providers | Details |
|---|---|---|
| API Key | Anthropic, OpenAI, Gemini, Cohere, and many OpenAI-compatible providers | Static key via env var or settings |
| OAuth | Anthropic, OpenAI Codex, Google Gemini CLI, Google Antigravity, Kimi for Coding, GitHub Copilot, GitLab, and extension-defined OAuth providers | PKCE/state-validated flow with automatic refresh; Kimi uses device flow |
| AWS Credentials | Bedrock | Access key + secret + optional session token; region-aware |
| Service Key | SAP AI Core | Client ID/secret exchange for bearer token |
| Bearer Token | Custom providers | Static token in auth storage |
OAuth token lifecycle:
pi with an OAuth-configured providerauth.json for an existing tokenauth.jsonGoogle CLI-style OAuth providers carry project metadata with the token payload. Pi preserves and refreshes that payload and can resolve project IDs from GOOGLE_CLOUD_PROJECT or local gcloud config when needed.
Credential status reporting: pi config shows the status of each configured provider's credentials: Missing, ApiKey, OAuthValid (with time until expiry), OAuthExpired (with time since expiry), AwsCredentials, or BearerToken.
Diagnostic codes: Auth failures produce specific diagnostic codes (MissingApiKey, InvalidApiKey, QuotaExceeded, OAuthTokenRefreshFailed, MissingAzureDeployment, MissingRegion, etc.) with context-specific error hints rather than generic messages.
Read file contents (optionally images):
Input: { "path": "src/main.rs", "offset": 10, "limit": 50 }
Execute shell commands with timeout and output capture:
Input: { "command": "cargo test", "timeout": 120 }
timeout: 0 to disable the default timeoutSurgical string replacement:
Input: { "path": "src/lib.rs", "old": "fn foo()", "new": "fn bar()" }
Search file contents:
Input: { "pattern": "TODO", "path": "src/", "context": 2, "limit": 100 }
Discover files by pattern:
Input: { "pattern": "*.rs", "path": "src/", "limit": 1000 }
fdList directory contents:
Input: { "path": "src/", "limit": 500 }
/CLI tools have different performance requirements than servers or GUI applications. The critical metric is time-to-first-interaction: how quickly can the user start typing after invoking the command?
| Phase | TypeScript/Node.js | Rust |
|---|---|---|
| Process spawn | ~10ms | ~10ms |
| Runtime initialization | 200-500ms | 0ms (no runtime) |
| Module loading | 100-300ms | 0ms (static linking) |
| JIT warmup | 50-100ms | 0ms (AOT compiled) |
| Total | 360-910ms | ~10ms |
This difference compounds with usage frequency, especially in short iterative terminal workflows.
Pi is optimized more like a low-latency engine than a typical CLI app. We intentionally apply aggressive optimization at multiple layers, not just "compile with --release and hope."
Where the speed comes from:
This is why Pi can stay responsive even with heavy streaming, tool usage, large session histories, and extension workloads running at the same time.
This catalog reflects a long sequence of changes across runtime, storage, streaming, and UI.
Concrete engineering work in this codebase includes:
| Area | What we built | Why it matters |
|---|---|---|
| Extension dispatch core | Typed hostcall opcode fast paths, compatibility fallback lane, zero-copy payload arena, canonical-hash shortcuts, interned operation paths | Cuts per-call overhead on the hottest extension operations while preserving correctness fallback paths |
| Registry/policy lookup | Immutable policy snapshots with O(1) capability checks, plus RCU-style metadata snapshots for extension registry/tool metadata | Removes repeated dynamic lookup overhead in hot authorization/dispatch paths |
| Queueing + concurrency | Core-pinned SPSC reactor mesh, S3-FIFO-inspired admission with fairness guards, BRAVO-style fallback behavior | Improves tail latency under contention instead of only optimizing median latency |
| Batched execution | AMAC-style interleaved batch executor with stall-aware toggling | Avoids head-of-line stalls when many independent hostcalls are in flight |
| IO specialization | io_uring lane policy + telemetry and compatibility fallback | Routes IO-heavy calls to a specialized lane when safe and beneficial |
| Runtime startup for extensions | QuickJS pre-warm path, warm isolate/bytecode cache behavior, and startup pipeline parallelization | Lowers cold-start overhead before the first real extension workload |
| JS bridge/scheduler path | Pending-job fast-path tuning and bridge-level dispatch cleanup in hot extension loops | Reduces overhead between JS requests and Rust execution |
| Adaptive control loops | Regime-shift detection (CUSUM/BOCPD), budget controllers, VOI-based experiment planner, mean-field load controller, off-policy evaluation gates | Lets runtime behavior adapt to workload shifts without blind tuning |
| Fast-path safety | Shadow dual-execution sampling with automatic rollback/backoff on divergence | Prevents speed work from silently changing semantics |
| Trace-level optimization | Hostcall superinstruction compiler + tier-2 trace-JIT path | Fuses repeated opcode traces and lowers repeated dispatch overhead |
| Tool execution strategy | Dynamic read-only tool classification + parallel tool execution paths | Increases throughput when multiple tool calls can safely run together |
| Session write path | Write-behind autosave queue, durability modes, incremental append, checkpointed rewrite strategy, single-buffer append serialization | Keeps interaction fast while still supporting stronger durability when needed |
| Session index path | Async coalesced index updates, pending-drain hot-path tuning, and reduced allocation/boxing overhead | Keeps resume/discovery metadata updates off the interactive hot path |
| Session resume path | Session Store V2 sidecar (segmented log + offset index), O(index+tail) open path, stale-sidecar detection, migration/rollback tooling | Avoids full-file rescans on large histories and improves recovery behavior |
| Long-session maintenance | Background compaction/snapshot workers with quotas + lazy hydration paths | Controls session growth and keeps long-running workspaces responsive |
| Compaction internals | Binary-search cut-point logic, serialization helper extraction, and zero-allocation-oriented cleanup on compaction paths | Lowers compaction overhead and keeps compaction from becoming a pause source |
| Streaming internals | SSE parser event-type interning, scanned-byte tracking (scanned_len), UTF-8 recovery hardening |
Reduces repeated scanning/allocation during token streaming and improves resilience |
| Provider/message memory path | Zero-copy request context migration (Cow), Arc-based message/result sharing, clone elimination in stream end paths |
Removes hot-path cloning and allocation churn in core agent/provider loops |
| TUI rendering path | Message render cache, conversation-prefix cache, reusable render buffers, viewport/render-path refactors, Criterion perf gates | Reduces redraw cost and jitter while streaming long outputs |
| Startup/resource loading | Parallelized skill/prompt/theme loading plus precomputed tool definitions/command names | Moves heavy initialization off the critical path to improve time-to-interaction |
| Allocator/build profile | jemalloc default for allocation-heavy paths, speed-oriented release profile, strict artifact validation for perf claims | Improves runtime consistency and prevents benchmark/reporting drift |
| Perf governance | Scenario matrix, claim-integrity gates, strict no-data fail behavior, variance/confidence artifacts, reproducible orchestration bundles | Makes performance claims auditable and regression detection automatic |
| Hostcall marshalling planner | HostcallRewriteEngine uses a small cost model to pick fast opcode fusion only when it is clearly cheaper and unambiguous; otherwise it stays on the canonical path and records fallback reason |
Gains speed on hot marshalling shapes without risking silent semantic drift from ambiguous rewrites |
| Tool text-processing hot path | truncate_head/truncate_tail use lazy line traversal and memchr line counting; normalization switched to a single-pass transform instead of chained string rewrites |
Large file/tool outputs avoid unnecessary intermediate allocations and stay responsive |
| JS bridge regex and parser micro-paths | Frequent regex checks in the extension JS bridge are OnceLock-cached; hot bridge calls avoid repeated setup work |
Cuts repeated per-call overhead in extension-heavy sessions |
| CLI/autocomplete process-spawn reduction | fd/rg availability checks are cached, and autocomplete file-index refresh runs in background with stale-update dropping |
Keeps completion and command handling quick even in very large repositories |
| Session identity/index micro-optimizations | O(1) entry-id cache, single-pass metadata finalization, and append-path cleanups replace multi-pass/copy-heavy behavior | Reduces overhead on append/save/rebuild when sessions become large |
| Benchmark-driven rollback discipline | Candidate micro-optimizations are benchmarked and can be reverted when real workloads regress (for example, a newline-scan memchr swap in SSE was rolled back) |
Prevents "optimizations" that look fast in theory but slow down real usage |
Optimization spans algorithms, execution lanes, memory movement, queue discipline, storage layout, and validation policy as one system.
The shipping release profile is tuned for runtime speed:
[profile.release]
opt-level = 3 # Maximum speed optimization
lto = true # Link-time optimization across all crates
codegen-units = 1 # Single codegen unit (slower compile, better optimization)
panic = "abort" # No unwinding machinery
strip = true # Remove symbol tables
Binary size is still explicitly budgeted in CI via binary_size_release, with the current release threshold set to 22.0 MB.
Pi separates performance evidence artifacts from distributable release artifacts:
scripts/perf/orchestrate.sh and scripts/bench_extension_workloads.sh, and must carrycorrelation_id) plus profile labels (for example build_profile=perf).release profile andPolicy implication: release/size artifacts alone are not valid evidence for global performance
claims. Performance claims must cite benchmark evidence bundles with reproducible provenance.
See docs/testing-policy.md and docs/releasing.md for normative policy details.
Latest full orchestrator checkpoint (2026-02-19):
/data/tmp/pi_agent_rust/codex/perf/full_local_skipbuild_retry_20260219T0650Zfullbench-local-skipbuild-retry-20260219T0650Z11 suites total, 9 pass, 2 fail (perf_budgets, perf_regression)--help P95 3.8ms, --version P95 3.6ms.For day-to-day implementation, use targeted checks to keep iteration fast. Reserve definitive
benchmark conclusions for integration boundaries where full evidence is regenerated.
cargo fmt --check and targeted test replays (rch exec -- cargo test --test ... when compilation is non-trivial).rch exec -- ... or script wrappers with --require-rch), then requiretests/perf/reports/phase1_matrix_validation.jsontests/full_suite_gate/full_suite_verdict.jsontests/full_suite_gate/certification_verdict.jsontests/full_suite_gate/extension_remediation_backlog.jsonThis keeps inner loops responsive while preserving strict claim-integrity at release time.
Pi includes a dedicated workload harness for extension runtime bottlenecks:
cargo run --bin ext_workloads -- \
--out artifacts/perf/ext_workloads.jsonl \
--matrix-out artifacts/perf/ext_hostcall_hotspot_matrix.json \
--trace-out artifacts/perf/ext_hostcall_bridge_trace.jsonl
This harness does more than raw timing:
marshal, queue, schedule, policy, execute, iopi.ext.hostcall_hotspot_matrix.v1) for quick bottleneck rankingpi.ext.hostcall_trace.v1) for per-call debuggingpi.ext.voi_scheduler.v1) to recommend the next highest-value experiments under a fixed overhead budgetIn plain terms: it helps answer "what should we optimize next?" with data, not guesswork.
Pi's perf pipeline includes strict evidence checks so global speed claims cannot be based on partial or stale data.
scripts/perf/orchestrate.sh generates artifacts tied to a shared correlation_id for the same run.scripts/e2e/run_all.sh validates required schemas, freshness, and correlation_id alignment before considering claims valid.tests/release_evidence_gate.rs fails closed when conformance/perf artifacts are missing run_id or correlation_id, or when lineage fields disagree across linked artifacts.scripts/e2e/run_all.sh emits an evidence-adjudication matrix and only treats evidence as canonical when freshness and lineage checks both pass.pi.perf.extension_benchmark_stratification.v1pi.perf.phase1_matrix_validation.v1pi.claim_integrity.evidence_adjudication_matrix.v1If the evidence set is incomplete or contradictory, the claim-integrity gate stays closed and reports exactly why.
Shipping builds default to the platform allocator. For allocator experiments, Pi
supports jemalloc as an explicit build-time option:
# System allocator baseline + jemalloc variant in one repeatable run
BENCH_ALLOCATORS_CSV=system,jemalloc \
./scripts/bench_extension_workloads.sh
The benchmark harness records both requested and effective allocator metadata in
its JSONL output (allocator_requested, allocator_effective,
allocator_fallback_reason) via PI_BENCH_ALLOCATOR.
system: build and run without allocator feature overridesjemalloc: build with --features jemallocauto: prefer jemalloc, fall back to system if build failsIf jemalloc is requested but unavailable for the current build, the run fails
closed to the compiled allocator and includes a fallback reason in artifacts.
Rust's ownership model enables predictable memory usage without garbage collection pauses:
| State | Memory |
|---|---|
| Startup (idle) | ~15MB |
| Active session (small) | ~25MB |
| Large file in context | ~30-50MB |
| Streaming response | +0MB (streamed, not buffered) |
The absence of a GC means no surprise latency spikes during streaming output.
Responses stream token-by-token from the API to the terminal with minimal buffering:
API Server → TCP → SSE Parser → Event Handler → Terminal
│ │
└──────── no buffering ────────┘
Each token appears on screen within milliseconds of leaving Anthropic's servers. The SSE parser processes events as they arrive rather than waiting for complete responses.
The interactive TUI targets 60fps rendering with several optimization layers:
Frame timing telemetry: Every render cycle is instrumented. Slow frames (>16ms) are tracked and classifiable by phase: viewport sync, message encoding, markdown rendering. This data feeds the internal performance monitor.
Message render cache: Markdown-to-ANSI conversion involves syntax highlighting, table layout, and link detection. Pi caches the rendered output per message and invalidates only on theme change or terminal resize. During streaming, only the actively-changing message is re-rendered; all prior messages hit the cache.
Pre-allocated render buffers: The RenderBuffers struct is reused across render cycles. Rather than allocating new String buffers each frame, Pi writes into pre-sized buffers and clears them before reuse, eliminating thousands of small allocations per second during streaming.
Memory pressure monitoring: A MemoryMonitor samples process heap size and classifies it into three tiers:
| Tier | Threshold | Action |
|---|---|---|
| Normal | <80% of budget | No action |
| Pressure | 80-95% of budget | Collapse tool output displays, hide thinking blocks |
| Critical | >95% of budget | Truncate older messages, force compaction |
Progressive degradation keeps Pi responsive during long sessions with accumulated tool output.
For a more complete guide, see docs/troubleshooting.md.
The find tool requires fd:
# Ubuntu/Debian
apt install fd-find
# macOS
brew install fd
# The binary might be named fdfind
ln -s $(which fdfind) ~/.local/bin/fd
export ANTHROPIC_API_KEY="sk-ant-..."
# Or in settings.json
{ "apiKey": "sk-ant-..." }
# Or per-command
pi --api-key "sk-ant-..." "Hello"
Sessions are append-only JSONL. If corruption occurs:
# Start fresh
pi --no-session
# Or delete the problematic session
rm ~/.pi/agent/sessions/--home-user-project--/corrupted-session.jsonl
Check your network connection. Pi uses SSE which requires stable connections:
# Test with curl
curl -N https://api.anthropic.com/v1/messages
This is intentional to prevent context overflow. Use offset/limit:
# In the conversation
"Read lines 2000-4000 of that file"
Pi is honest about what it doesn't do:
| Limitation | Workaround |
|---|---|
| Not all provider APIs | Built-in support includes Anthropic, OpenAI (Chat + Responses), Gemini, Cohere, Azure OpenAI, Bedrock, Vertex AI, GitHub Copilot, and GitLab Duo; some ecosystem-specific APIs are still TBD |
| No web browsing | Use bash with curl |
| No GUI | Terminal-only by design |
| Some extensions need npm stubs | Common stubs are provided; unlisted npm packages still require a stub. See EXTENSIONS.md §8.1 |
| English-centric | Works but not optimized for other languages |
| Nightly Rust required | Uses 2024 edition features |
This port follows a "specification extraction" methodology rather than line-by-line translation:
This approach yields better code than mechanical translation. TypeScript idioms (callbacks, promises, class hierarchies) don't map cleanly to Rust (ownership, traits, enums). Fighting the language produces worse results than embracing it.
The test suite includes fixture-based conformance tests that validate tool behavior:
{
"version": "1.0",
"tool": "edit",
"cases": [
{
"name": "edit_simple_replace",
"setup": [
{"type": "create_file", "path": "test.txt", "content": "Hello, World!"}
],
"input": {
"path": "test.txt",
"oldText": "World",
"newText": "Rust"
},
"expected": {
"content_contains": ["Successfully replaced"],
"details": {"oldLength": 5, "newLength": 4}
}
}
]
}
Each fixture specifies:
The Rust implementation can be validated against the TypeScript original without coupling to implementation details.
Pi supports legacy JS/TS extensions via an embedded QuickJS runtime. Unlike
traditional plugin systems, extensions run in a sandboxed, capability-gated
environment with no ambient OS access:
docs/ext-compat.md and tests/ext_conformance/reports/pipeline/Legacy extension behavior is automatic:
.js/.ts extensions run directly (no manual conversion step).*.native.json descriptors are optional and mainly useful for native-rust runtime workflows.Policy preset quick-start:
# Inspect current effective policy
pi --explain-extension-policy
# Switch profile for one command (safe | balanced | permissive)
pi --extension-policy balanced --explain-extension-policy
# Legacy alias is still accepted:
pi --extension-policy standard --explain-extension-policy
# Narrow dangerous-capability opt-in (preferred over permissive)
PI_EXTENSION_ALLOW_DANGEROUS=1 pi --extension-policy balanced --explain-extension-policy
Operator rollout playbook (compatibility-first local defaults + explicit lock-down):
# 1) Baseline: verify defaults are compatibility-first (`permissive`)
pi --explain-extension-policy
# 2) Staging: use balanced prompting, dangerous caps still denied by default
pi --extension-policy balanced --explain-extension-policy
# 3) Explicit lock-down for strict local/CI runs
pi --extension-policy safe --explain-extension-policy
# 4) Narrow opt-in for dangerous capabilities (preferred path)
PI_EXTENSION_ALLOW_DANGEROUS=1 pi --extension-policy balanced --explain-extension-policy
# 5) Explicit permissive mode when you want to be unambiguous
pi --extension-policy permissive --explain-extension-policy
settings.json baseline for local/dev:
{
"extensionPolicy": {
"defaultPermissive": true
}
}
Use this to restore the stricter fallback without CLI flags:
{
"extensionPolicy": {
"defaultPermissive": false
}
}
Interactive TUI: open /settings and toggle extensionPolicy.defaultPermissive.
CI guidance:
# CI default: keep dangerous capabilities disabled
pi --extension-policy safe --explain-extension-policy
# CI opt-in job (only where required), keep explicit and auditable
PI_EXTENSION_ALLOW_DANGEROUS=1 pi --extension-policy balanced --explain-extension-policy
Rollback rule: remove PI_EXTENSION_ALLOW_DANGEROUS, set extensionPolicy.profile
back to safe or set extensionPolicy.defaultPermissive to false, and re-run
pi --explain-extension-policy to confirm deny decisions.
See EXTENSIONS.md for the full architecture, runtime contract,
and conformance results.
The #![forbid(unsafe_code)] directive is project-wide and non-negotiable. Rationale:
rustls)The safe Rust subset provides all necessary functionality without compromising security.
Q: What's the relationship to the original Pi Agent?
A: This is an authorized Rust port of Pi Agent by Mario Zechner, created with his blessing. The architecture differs significantly from the TypeScript original: it uses asupersync for structured concurrency and rich_rust (a port of Will McGugan's Rich library) for terminal rendering. The goal is idiomatic Rust while preserving Pi Agent's UX.
Q: Why rewrite in Rust?
A: Startup time matters when you're in a terminal all day. Rust gives us <100ms startup vs 500ms+ for Node.js. Plus, no runtime dependencies to manage.
Q: Can I use providers beyond Anthropic (OpenAI/Gemini/Cohere/Azure/Bedrock/Vertex/Copilot/GitLab/Codex)?
A: Yes. Native providers include Anthropic, OpenAI (Chat + Responses + Codex Responses), Gemini (native + Gemini CLI + Antigravity routes), Cohere, Azure OpenAI, Amazon Bedrock, Vertex AI, GitHub Copilot, and GitLab Duo. Pi also supports many OpenAI-compatible presets (for example Groq, OpenRouter, Mistral, Together, DeepSeek, Cerebras, DeepInfra, Alibaba/Qwen, and Moonshot/Kimi). Provider IDs and aliases are case-insensitive. Set credentials and choose via --provider/--model; run pi --list-providers to see canonical IDs, aliases, and env keys.
Q: How do sessions work?
A: Each session is a JSONL file with message entries. Sessions are per-project (based on working directory) and support branching via parent references.
Q: Why is unsafe forbidden?
A: Memory safety is non-negotiable for a tool that executes arbitrary commands. The performance cost is negligible for this use case.
Q: How do I extend Pi?
A: Pi has a full extension system with two runtime families: JS/TS entrypoints run in embedded QuickJS, and *.native.json descriptors run in the native-rust descriptor runtime. Both are capability-gated and audited through the same policy system. One session uses one runtime family at a time. Extensions can register tools, slash commands, event hooks, flags, and custom providers. See EXTENSIONS.md for details. For built-in tool changes, implement the Tool trait in src/tools.rs.
Q: Why isn't X feature included?
A: Pi focuses on core coding assistance. Features like web browsing, image generation, etc. are out of scope. Use specialized tools for those.
Q: How does compaction work?
A: When a conversation exceeds the model's context window, Pi summarizes older messages using the LLM itself, storing the summary as a session entry. Recent messages are kept verbatim. The cut point is chosen at a turn boundary, and the summary includes a record of which files were read or modified so the model retains that awareness. Compaction runs automatically after each agent turn when needed, or manually via /compact.
Q: Can I add a custom provider that Pi doesn't support natively?
A: Yes. Create a models.json file in ~/.pi/agent/ or .pi/ with entries specifying the model ID, base URL, and API type (usually openai-completions for OpenAI-compatible endpoints). Pi's compat config system handles field name differences and feature flag overrides. Extensions can also register entirely custom providers.
Q: How does Pi decide which session to resume?
A: Pi maintains a SQLite index of all session files. When you run pi -c, it queries the index for the most recently modified session whose working directory matches your current project. This avoids scanning the filesystem on every resume.
Q: What happens if an extension tries to access something dangerous?
A: Every hostcall from an extension is checked against the active capability policy before execution. Dangerous capabilities (exec, env) are denied by default under safe and balanced unless explicitly opted in (for example via PI_EXTENSION_ALLOW_DANGEROUS=1), and are available under permissive. For exec, Pi then applies command mediation before spawn: it classifies command+arg signatures and blocks critical classes by default (for example recursive delete, disk/device write, reverse shell), with strict/safe policy able to block high-tier classes as well (for example shutdown, process-kill, credential-file modification). Denied calls return errors to the extension Promise path, and denial events are recorded in redacted security-alert and exec-mediation audit artifacts. Sensitive env keys (API keys/tokens/secrets) remain filtered. If behavior escalates, you can kill-switch that extension into quarantined killed state immediately or force compatibility-lane routing as a containment step while investigating.
Q: Does Pi work with self-hosted or proxied LLMs?
A: Yes. Point any provider at a custom base URL via models.json. Pi normalizes URL paths per API type and applies compatibility overrides for field-name and feature differences. This works with vLLM, Ollama, LiteLLM, and similar OpenAI-compatible servers.
| Feature | Pi | Claude Code | Aider | Cursor |
|---|---|---|---|---|
| Language | Rust | TypeScript | Python | Electron |
| Startup | <100ms | ~1s | ~2s | ~5s |
| Memory | <50MB | ~200MB | ~150MB | ~500MB |
| Providers | Anthropic + OpenAI/Responses + Gemini/Cohere + Azure/Bedrock/Vertex + Copilot/GitLab + OpenAI-compatible presets | Anthropic | Many | Many |
| Tools | 8 built-in | Many | File-focused | IDE-integrated |
| Sessions | JSONL tree | Proprietary | Git-based | Proprietary |
| Open source | Yes | Yes | Yes | No |
rch exec -- cargo build # Debug build (remote offload)
rch exec -- cargo build --release # Release build (optimized, remote offload)
rch exec -- cargo test # Run tests (remote offload)
# Lint checks (remote-safe split to avoid rch clippy timeout fail-open)
rch exec -- cargo clippy --lib --bins -- -D warnings
rch exec -- cargo clippy --tests -- -D warnings
rch exec -- cargo clippy --benches -- -D warnings
rch exec -- cargo clippy --examples -- -D warnings
# Unified verification runner (recommended for deterministic evidence artifacts)
./scripts/e2e/run_all.sh --profile focused
./scripts/e2e/run_all.sh --profile ci
./scripts/e2e/run_all.sh --rerun-from tests/e2e_results/<timestamp>/summary.json --skip-unit
# Fast smoke/extension quality wrappers with strict remote enforcement
./scripts/smoke.sh --require-rch
./scripts/ext_quality_pipeline.sh --require-rch
# Multi-agent safety: with CODEX_THREAD_ID set, run_all defaults
# CARGO_TARGET_DIR to target/agents/<CODEX_THREAD_ID> unless overridden.
# Set CARGO_TARGET_DIR explicitly if you want a custom shared or isolated target.
# All tests
rch exec -- cargo test
# Specific module
rch exec -- cargo test tools::tests
rch exec -- cargo test sse::tests
# Conformance tests
rch exec -- cargo test conformance
Focused validation tools:
# Dev-firstset gate before release build
rch exec -- cargo build --bin pi --bin ext_release_binary_e2e
PI_HTTP_REQUEST_TIMEOUT_SECS=0 rch exec -- \
cargo run --bin ext_release_binary_e2e -- \
--pi-bin target/debug/pi \
--provider ollama --model qwen2.5:0.5b \
--jobs 10 --timeout-secs 600 --max-cases 20 --extension-policy balanced
# Full optimized release-binary run after gate passes
rch exec -- cargo build --release --bin pi --bin ext_release_binary_e2e
PI_HTTP_REQUEST_TIMEOUT_SECS=0 target/release/ext_release_binary_e2e \
--pi-bin target/release/pi \
--provider ollama --model qwen2.5:0.5b \
--jobs 10 --timeout-secs 600 --extension-policy balanced
# Runtime risk ledger forensics (verify, replay, calibrate)
rch exec -- cargo run --bin ext_runtime_risk_ledger -- verify --input path/to/runtime_risk_ledger.json
rch exec -- cargo run --bin ext_runtime_risk_ledger -- replay --input path/to/runtime_risk_ledger.json
rch exec -- cargo run --bin ext_runtime_risk_ledger -- calibrate --input path/to/runtime_risk_ledger.json --objective balanced_accuracy
ext_runtime_risk_ledger operates on pi.ext.runtime_risk_ledger.v1 artifacts (for example, from incident bundle exports).Releases are tag-driven and must align with Cargo.toml versions.
vX.Y.Z (pre-releases like vX.Y.Z-rc.N are allowed but skip crates.io publish).package.version in Cargo.toml.asupersync → rich_rust → charmed-* (lipgloss, bubbletea, bubbles, glamour) → pi_agent_rust..github/workflows/publish.yml handles crates.io publish when CARGO_REGISTRY_TOKEN is set.Coverage uses cargo-llvm-cov:
# One-time install
cargo install cargo-llvm-cov --locked
rustup component add llvm-tools-preview
# Summary (fastest)
cargo llvm-cov --all-targets --workspace --summary-only
# LCOV report (for CI/artifacts)
CI=true VCR_MODE=playback VCR_CASSETTE_DIR=tests/fixtures/vcr \
cargo llvm-cov --all-targets --workspace --lcov --output-path lcov.info
# HTML report (defaults to target/llvm-cov/html)
cargo llvm-cov --all-targets --workspace --html
Selected core modules (non-exhaustive):
src/
├── main.rs # CLI entry point
├── lib.rs # Library exports
├── app.rs # Startup/model selection helpers
├── agent.rs # Agent loop + event orchestration
├── agent_cx.rs # asupersync capability context wiring
├── cli.rs # Argument parsing
├── config.rs # Configuration
├── auth.rs # API key/OAuth/AWS credential storage
├── model.rs # Message/content/stream event types
├── provider.rs # Provider trait
├── provider_metadata.rs # Canonical provider IDs + routing defaults
├── models.rs # Model registry + models.json overrides
├── providers/
│ ├── anthropic.rs # Anthropic Messages API
│ ├── openai.rs # OpenAI Chat Completions
│ ├── openai_responses.rs # OpenAI Responses API
│ ├── gemini.rs # Gemini API
│ ├── cohere.rs # Cohere Chat API
│ ├── azure.rs # Azure OpenAI
│ ├── bedrock.rs # Amazon Bedrock Converse
│ ├── vertex.rs # Google Vertex AI
│ ├── copilot.rs # GitHub Copilot backend
│ ├── gitlab.rs # GitLab Duo backend
│ └── mod.rs # Provider factory + extension bridge
├── tools.rs # Built-in tool implementations
├── sse.rs # Streaming SSE parser
├── http/
│ ├── client.rs # asupersync-backed HTTP client
│ ├── sse.rs # HTTP SSE helpers
│ └── mod.rs
├── session.rs # JSONL session persistence/tree ops
├── session_index.rs # SQLite session metadata index/cache
├── session_sqlite.rs # Optional sqlite-sessions backend
├── compaction.rs # Context compaction algorithm
├── interactive.rs # Interactive TUI app loop/state
├── interactive/ # Bubble Tea-style TUI submodules
├── rpc.rs # RPC/stdio mode
├── extensions.rs # Extension protocol + policy + security
├── extensions_js.rs # QuickJS runtime bridge + hostcalls
├── extension_dispatcher.rs # Hostcall/tool dispatch plumbing
├── extension_preflight.rs # Extension compatibility scanner
├── extension_validation.rs # Extension validation pipeline glue
├── resources.rs # Skills/prompt/theme/extension loading
└── tui.rs # Terminal UI rendering helpers
Each entry below includes the document name, purpose, bottom-line takeaway, and direct link.
| Category | What it covers | Jump |
|---|---|---|
| Extension Ecosystem | extension architecture, corpus, catalogs, conformance, compatibility | Go |
| Core Ops and UX | governance, CI/QA ops, keybindings, prompts, packaging, model/config UX, drop-in migration/certification | Go |
| Provider Subsystem | provider audits, setup contracts, parity and remediation artifacts | Go |
| QA, Schemas, Security, Platform | release/QA runbooks, schemas, security baselines, platform guides | Go |
docs/BRANCH_PROTECTION.md - Purpose: define branch protection policy and enforcement rules. Bottom line: CI and review gates are mandatory for mainline integrity. Link: Viewdocs/EXTENSION_CANDIDATES.md - Purpose: catalog candidate extensions for inclusion/testing. Bottom line: start extension triage from this longlist. Link: Viewdocs/EXTENSION_CAPTURE_SCENARIOS.md - Purpose: define extension capture and replay scenarios. Bottom line: use these scenarios for deterministic extension behavior capture. Link: Viewdocs/EXTENSION_POPULARITY_CRITERIA.md - Purpose: specify how extension popularity is scored. Bottom line: popularity decisions follow explicit ranking criteria. Link: Viewdocs/EXTENSION_REFRESH_CHECKLIST.md - Purpose: operational checklist for extension corpus refreshes. Bottom line: follow this to update corpus safely and repeatably. Link: Viewdocs/EXTENSION_SAMPLE.md - Purpose: human-readable sample extension contract. Bottom line: reference this when authoring new extension packages. Link: Viewdocs/EXTENSION_SAMPLING_MATRIX.md - Purpose: define extension sampling strategy and strata. Bottom line: test coverage comes from this sampling matrix. Link: Viewdocs/LEGACY_EXTENSION_RUNNER.md - Purpose: explain legacy extension runner behavior and constraints. Bottom line: use for compatibility context, not new runtime design. Link: Viewdocs/PIJS_PROOF_REPORT.md - Purpose: provide formal evidence for PiJS runtime properties. Bottom line: PiJS eliminates ambient authority by design. Link: Viewdocs/TEST_COVERAGE_MATRIX.md - Purpose: map modules to test suites and coverage status. Bottom line: this is the coverage gap dashboard. Link: Viewdocs/capability-prompts.md - Purpose: document capability prompt UX and policy semantics. Bottom line: prompts are policy artifacts, not ad hoc UI. Link: Viewdocs/ci-operator-runbook.md - Purpose: CI operations runbook for maintainers. Bottom line: use this for incident response in CI pipelines. Link: Viewdocs/conformance-operator-playbook.md - Purpose: operating guide for conformance campaigns. Bottom line: run conformance with this playbook for reproducible results. Link: Viewdocs/coverage-baseline-map.json - Purpose: machine-readable mapping of coverage baselines. Bottom line: automation should use this as baseline source of truth. Link: Viewdocs/development.md - Purpose: contributor-facing development workflow reference. Bottom line: this is the canonical local dev setup guide. Link: Viewdocs/e2e_scenario_matrix.json - Purpose: machine-readable matrix of E2E scenarios. Bottom line: E2E scope and expected flows are defined here. Link: Viewdocs/evidence-contract-schema.json - Purpose: schema for evidence artifacts and logs. Bottom line: evidence producers must conform to this contract. Link: Viewdocs/ext-compat.md - Purpose: explain extension compatibility posture and boundaries. Bottom line: this is the quick compatibility reference. Link: Viewdocs/extension-api-matrix.json - Purpose: API surface matrix for extension host/runtime calls. Bottom line: use this to see supported extension APIs at a glance. Link: Viewdocs/extension-architecture.md - Purpose: architecture deep dive for extension runtime internals. Bottom line: this is the primary extension technical design doc. Link: Viewdocs/extension-artifact-provenance.json - Purpose: provenance metadata for extension artifacts. Bottom line: artifact trust and lineage are tracked here. Link: Viewdocs/extension-candidate-pool.json - Purpose: machine-readable candidate pool for extension selection. Bottom line: ingestion/selection jobs should read from this pool. Link: Viewdocs/extension-catalog.json - Purpose: master extension catalog and status metadata. Bottom line: this is the canonical extension inventory. Link: Viewdocs/extension-catalog.schema.json - Purpose: schema for validating extension catalog documents. Bottom line: catalog updates must pass this schema. Link: Viewdocs/extension-code-search-inventory.json - Purpose: inventory of code-search findings across extension repos. Bottom line: use this as raw search evidence. Link: Viewdocs/extension-code-search-summary.json - Purpose: summarized results from extension code-search inventory. Bottom line: top code-search findings are condensed here. Link: Viewdocs/extension-collections.json - Purpose: grouped extension collections by theme/scope. Bottom line: collection-level planning starts here. Link: Viewdocs/extension-compatibility-matrix.md - Purpose: compatibility matrix for extension runtime support levels. Bottom line: check this before claiming compatibility gaps. Link: Viewdocs/extension-conformance-matrix.json - Purpose: machine-readable extension conformance outcomes. Bottom line: this is the conformance truth table for automation. Link: Viewdocs/extension-conformance-test-plan.json - Purpose: test planning artifact for extension conformance. Bottom line: conformance execution should follow this plan. Link: Viewdocs/extension-curated-list-summary.json - Purpose: summary of curated extension subsets. Bottom line: use for quick curated corpus decisions. Link: Viewdocs/extension-entry-scan.json - Purpose: scan results for extension entrypoint detection. Bottom line: extension entry assumptions are validated here. Link: Viewdocs/extension-inclusion-list.json - Purpose: explicit inclusion list for extension sets. Bottom line: this controls what is in-scope for runs. Link: Viewdocs/extension-individual-enumeration.json - Purpose: per-extension enumeration details and metadata. Bottom line: drill into individual extension records here. Link: Viewdocs/extension-license-report.json - Purpose: license audit results for extension corpus. Bottom line: licensing risks and statuses are captured here. Link: Viewdocs/extension-master-catalog.json - Purpose: high-authority upstream extension catalog snapshot. Bottom line: this anchors full-corpus sync and provenance checks. Link: Viewdocs/extension-npm-scan-summary.json - Purpose: summary of npm-based extension scanning outputs. Bottom line: npm scan posture is summarized here. Link: Viewdocs/extension-onboarding-queue.json - Purpose: machine-readable onboarding queue for extension work. Bottom line: queue automation should consume this file. Link: Viewdocs/extension-onboarding-queue.md - Purpose: human-readable extension onboarding queue and context. Bottom line: this is the operator queue board. Link: Viewdocs/extension-priority-summary.json - Purpose: summary of extension prioritization signals. Bottom line: priority rollups are centralized here. Link: Viewdocs/extension-priority.json - Purpose: detailed extension priority scoring data. Bottom line: per-extension prioritization is machine-readable here. Link: Viewdocs/extension-registry.md - Purpose: document extension registry model and usage. Bottom line: registry behavior and expectations are defined here. Link: Viewdocs/extension-repo-search-summary.json - Purpose: summary of repository-level extension discovery searches. Bottom line: repo discovery outcomes are consolidated here. Link: Viewdocs/extension-research-playbook.json - Purpose: machine-readable playbook for extension research workflow. Bottom line: research execution steps and outputs are formalized here. Link: Viewdocs/extension-runtime-threat-model.md - Purpose: runtime-focused extension threat model with controls/tests. Bottom line: this maps concrete runtime abuse paths to mitigations. Link: Viewdocs/extension-sample.json - Purpose: machine-readable extension sample payload/spec. Bottom line: use this as canonical sample JSON contract. Link: Viewdocs/extension-tiered-corpus.json - Purpose: tiered corpus composition for extension testing. Bottom line: corpus tiers and membership are defined here. Link: Viewdocs/extension-tiered-summary.json - Purpose: summary of tiered corpus coverage and counts. Bottom line: corpus tier health is summarized here. Link: Viewdocs/extension-troubleshooting.md - Purpose: troubleshooting guide specific to extension runtime issues. Bottom line: start here for extension failures and policy denials. Link: Viewdocs/extension-validated-dedup.json - Purpose: deduplicated validated extension list. Bottom line: use this to avoid duplicate extension artifacts. Link: Viewdocs/flake-triage-policy.md - Purpose: policy for test flake detection and triage handling. Bottom line: flakes must be classified and handled via this rubric. Link: Viewdocs/keybindings.md - Purpose: keybinding reference for interactive TUI usage. Bottom line: this is the operator shortcut map. Link: Viewdocs/models.md - Purpose: model catalog behavior, selection, and overrides. Bottom line: model resolution logic is documented here. Link: Viewdocs/non-mock-rubric.json - Purpose: rubric defining non-mock testing expectations. Bottom line: use this to gate real-behavior evidence quality. Link: Viewdocs/packages.md - Purpose: package installation and package-content conventions. Bottom line: package usage and structure are defined here. Link: Viewdocs/asupersync-leverage-inventory.md - Purpose: implementation-grade inventory of where additional Asupersync leverage is genuinely high-value in Pi core surfaces. Bottom line: start here before threading inherited AgentCx or replacing raw thread islands. Link: Viewdocs/dropin-certification-contract.json - Purpose: strict drop-in certification contract and gate thresholds. Bottom line: strict replacement messaging is controlled by this contract and its hard gates. Link: Viewdocs/dropin-parity-gap-ledger.json - Purpose: machine-readable ledger of known drop-in parity gaps and severity. Bottom line: unresolved critical/high gaps block strict replacement messaging. Link: Viewdocs/integrator-migration-playbook.md - Purpose: operator/integrator migration and rollback playbook for moving from TypeScript Pi to Rust Pi. Bottom line: use this to run staged, evidence-backed migrations. Link: Viewdocs/parity-certification.json - Purpose: machine-readable parity progress snapshot. Bottom line: informational status only; strict replacement release claims remain controlled by the drop-in contract and parity-gap closure status. Link: Viewdocs/program-governance.md - Purpose: governance model for roadmap, gates, and ownership. Bottom line: governance decisions and responsibilities are defined here. Link: Viewdocs/prompt-templates.md - Purpose: prompt template system and usage guide. Bottom line: reusable prompt behaviors are managed via this doc. Link: Viewdocs/sdk.md - Purpose: SDK cookbook and migration guide for embedding Pi programmatically. Bottom line: use this for copy/paste Rust equivalents of TypeScript SDK workflows. Link: Viewdocs/integrator-migration-playbook.md - Purpose: step-by-step migration and compatibility validation runbook for downstream integrators moving from TypeScript Pi to Rust Pi. Bottom line: follow this to execute and evidence a go/no-go migration decision safely. Link: Viewdocs/provider-audit-evidence-index.json - Purpose: index of provider audit evidence artifacts. Bottom line: audit traceability begins with this index. Link: Viewdocs/provider-audit-reconciliation-ledger.json - Purpose: ledger of provider audit reconciliation decisions. Bottom line: discrepancies and resolutions are recorded here. Link: Viewdocs/provider-auth-crosswalk.json - Purpose: provider auth alias/env-key crosswalk. Bottom line: credential resolution mapping is centralized here. Link: Viewdocs/provider-auth-failure-signatures.json - Purpose: known provider auth failure signatures and fingerprints. Bottom line: diagnose auth failures by matching this catalog. Link: Viewdocs/provider-auth-playbook-validation.json - Purpose: validation outcomes for provider auth playbook coverage. Bottom line: auth playbook quality is measured here. Link: Viewdocs/provider-auth-redaction-diagnostics.json - Purpose: diagnostics for provider auth redaction behavior. Bottom line: redaction correctness and gaps are captured here. Link: Viewdocs/provider-auth-troubleshooting.md - Purpose: troubleshooting guide for provider auth issues. Bottom line: use this first for provider key/auth problems. Link: Viewdocs/provider-baseline-audit.json - Purpose: machine-readable baseline provider audit data. Bottom line: structured provider baseline evidence lives here. Link: Viewdocs/provider-baseline-audit.md - Purpose: narrative baseline provider audit report. Bottom line: high-level provider gap picture is explained here. Link: Viewdocs/provider-canonical-id-policy.json - Purpose: machine-readable canonical provider ID policy. Bottom line: provider ID normalization rules are defined here. Link: Viewdocs/provider-canonical-id-policy.md - Purpose: human-readable canonical provider ID policy. Bottom line: this is the normative provider naming policy. Link: Viewdocs/provider-canonical-id-table.json - Purpose: canonical provider ID lookup table. Bottom line: use this for deterministic provider ID mapping. Link: Viewdocs/provider-cerebras-capability-profile.json - Purpose: Cerebras provider capability profile. Bottom line: Cerebras support envelope and constraints are here. Link: Viewdocs/provider-cerebras-setup.json - Purpose: Cerebras setup/config contract. Bottom line: configure Cerebras using this artifact. Link: Viewdocs/provider-closure-truth-table.json - Purpose: closure truth table for provider audit status. Bottom line: this is the provider closure scoreboard. Link: Viewdocs/provider-config-examples.json - Purpose: machine-readable provider config examples. Bottom line: automation-ready sample configs live here. Link: Viewdocs/provider-config-examples.md - Purpose: human-readable provider config examples. Bottom line: copy from here when setting up providers. Link: Viewdocs/provider-discrepancy-classification.json - Purpose: taxonomy for provider discrepancy classes. Bottom line: classify provider mismatches with this schema. Link: Viewdocs/provider-discrepancy-ledger.json - Purpose: ledger of concrete provider discrepancies. Bottom line: discrepancy history and status are tracked here. Link: Viewdocs/provider-gaps-audit-report.json - Purpose: aggregate provider gap audit report. Bottom line: provider parity gaps are quantified here. Link: Viewdocs/provider-gaps-test-matrix.json - Purpose: test matrix for closing provider gaps. Bottom line: missing provider tests are enumerated here. Link: Viewdocs/provider-gate-compiler-report.json - Purpose: provider gate report from compiler/build checks. Bottom line: compile-time provider gate status is here. Link: Viewdocs/provider-gate-e2e-report.json - Purpose: provider gate report from end-to-end tests. Bottom line: provider E2E gate results are captured here. Link: Viewdocs/provider-gate-tests-report.json - Purpose: consolidated provider test gate report. Bottom line: provider test gate pass/fail view is here. Link: Viewdocs/provider-gate-triage-matrix.json - Purpose: triage matrix for provider gate failures. Bottom line: use this to route provider gate failures fast. Link: Viewdocs/provider-gate-ubs-report.json - Purpose: UBS scan output for provider gate runs. Bottom line: provider bug-scan findings are recorded here. Link: Viewdocs/provider-groq-capability-profile.json - Purpose: Groq provider capability profile. Bottom line: Groq support/limits are defined here. Link: Viewdocs/provider-groq-setup.json - Purpose: Groq provider setup contract. Bottom line: configure Groq following this spec. Link: Viewdocs/provider-implementation-modes.json - Purpose: provider implementation mode classification. Bottom line: know which providers are native/bridged/mock from this file. Link: Viewdocs/provider-kimi-capability-profile.json - Purpose: Kimi provider capability profile. Bottom line: Kimi behavior envelope is documented here. Link: Viewdocs/provider-kimi-setup.json - Purpose: Kimi provider setup contract. Bottom line: use this for Kimi integration setup. Link: Viewdocs/provider-longtail-evidence.md - Purpose: evidence report for longtail provider coverage. Bottom line: longtail provider support claims are justified here. Link: Viewdocs/provider-migration-guide.md - Purpose: migration guide for provider model changes. Bottom line: follow this when migrating provider integrations. Link: Viewdocs/provider-native-parity-report.json - Purpose: report on native provider parity status. Bottom line: native parity progress is measured here. Link: Viewdocs/provider-onboarding-checklist.md - Purpose: onboarding checklist for adding new providers. Bottom line: provider additions must pass this checklist. Link: Viewdocs/provider-onboarding-playbook.md - Purpose: operational playbook for provider onboarding lifecycle. Bottom line: this is the end-to-end onboarding runbook. Link: Viewdocs/provider-openrouter-auth-contract.json - Purpose: OpenRouter auth behavior contract. Bottom line: OpenRouter auth resolution must match this contract. Link: Viewdocs/provider-openrouter-capability-profile.json - Purpose: OpenRouter capability profile. Bottom line: OpenRouter feature boundaries are documented here. Link: Viewdocs/provider-openrouter-dynamic-models-contract.json - Purpose: contract for OpenRouter dynamic model behavior. Bottom line: dynamic model handling rules are defined here. Link: Viewdocs/provider-openrouter-model-registry-contract.json - Purpose: OpenRouter model registry contract and expectations. Bottom line: registry consistency requirements are here. Link: Viewdocs/provider-openrouter-setup.json - Purpose: OpenRouter setup/config contract. Bottom line: configure OpenRouter from this artifact. Link: Viewdocs/provider-parity-checklist.json - Purpose: checklist for provider parity closure criteria. Bottom line: parity completion gates are encoded here. Link: Viewdocs/provider-parity-reconciliation-report.json - Purpose: report on provider parity reconciliations. Bottom line: reconciliation outcomes and rationale are here. Link: Viewdocs/provider-parity-reconciliation.json - Purpose: machine-readable parity reconciliation ledger. Bottom line: this is the structured parity reconciliation record. Link: Viewdocs/provider-qwen-capability-profile.json - Purpose: Qwen provider capability profile. Bottom line: Qwen support/constraints are captured here. Link: Viewdocs/provider-qwen-setup.json - Purpose: Qwen provider setup contract. Bottom line: configure Qwen using this file. Link: Viewdocs/provider-remediation-manifest.json - Purpose: manifest of provider remediation actions. Bottom line: remediation backlog and ownership are tracked here. Link: Viewdocs/provider-remediation-routing-validation.json - Purpose: validation of provider remediation routing logic. Bottom line: routing correctness is evidenced here. Link: Viewdocs/provider-support-baseline-audit.md - Purpose: baseline audit of declared provider support. Bottom line: support claims are grounded by this audit. Link: Viewdocs/provider-test-matrix-validation-report.json - Purpose: validation report for provider test matrix completeness. Bottom line: provider test matrix health is measured here. Link: Viewdocs/provider-test-obligations.md - Purpose: define provider-specific test obligations and gates. Bottom line: this is the provider test contract. Link: Viewdocs/provider-upstream-catalog-snapshot.json - Purpose: machine-readable snapshot of upstream provider catalogs. Bottom line: upstream drift detection starts from this snapshot. Link: Viewdocs/provider-upstream-catalog-snapshot.md - Purpose: narrative summary of upstream provider snapshot changes. Bottom line: upstream provider changes are explained here. Link: Viewdocs/provider_e2e_artifact_contract.json - Purpose: artifact contract for provider E2E evidence. Bottom line: provider E2E artifacts must satisfy this schema. Link: Viewdocs/providers.md - Purpose: provider architecture, behavior, and usage reference. Bottom line: this is the primary provider subsystem guide. Link: Viewdocs/qa-runbook.md - Purpose: QA operating runbook across suites and artifacts. Bottom line: use this for repeatable QA execution. Link: Viewdocs/releasing.md - Purpose: release process and checklist documentation. Bottom line: follow this for consistent, safe releases. Link: Viewdocs/rpc.md - Purpose: RPC mode protocol and usage contract. Bottom line: this is the wire-level RPC integration guide. Link: Viewdocs/schema/extension_manifest.json - Purpose: JSON schema for extension manifests. Bottom line: extension manifest validation must pass this schema. Link: Viewdocs/schema/extension_protocol.json - Purpose: JSON schema for extension protocol messages. Bottom line: extension message contracts are formalized here. Link: Viewdocs/schema/mock_spec.json - Purpose: schema for mock/test spec fixtures. Bottom line: mock fixtures should conform to this contract. Link: Viewdocs/schema/runtime_hostcall_telemetry.json - Purpose: JSON schema for runtime hostcall telemetry events. Bottom line: hostcall telemetry producers must conform to this schema. Link: Viewdocs/sec_traceability_matrix.json - Purpose: machine-readable security requirement traceability matrix. Bottom line: security coverage and test mappings are tracked here. Link: Viewdocs/sec_traceability_matrix.md - Purpose: narrative security traceability matrix with requirement-to-test linkage. Bottom line: human-readable security coverage status is documented here. Link: Viewdocs/security/baseline-audit.md - Purpose: code-grounded security baseline audit and gap analysis. Bottom line: this is the current security posture snapshot. Link: Viewdocs/security/incident-response-runbook.md - Purpose: incident response procedures for security events. Bottom line: follow this runbook during active security incidents. Link: Viewdocs/security/incident-runbook.md - Purpose: incident detection, classification, and handling guide. Bottom line: use this for initial incident triage and escalation. Link: Viewdocs/security/invariants.machine.json - Purpose: machine-checkable security invariant manifest with test mappings. Bottom line: invariant automation should read this file. Link: Viewdocs/security/invariants.md - Purpose: normative security invariants and precedence semantics. Bottom line: this is the SEC-1.2 policy/risk contract. Link: Viewdocs/security/lockfile-format.md - Purpose: lockfile format specification for extension integrity verification. Bottom line: lockfile structure and validation rules are defined here. Link: Viewdocs/security/maintenance-playbook.md - Purpose: security maintenance procedures and scheduled operations. Bottom line: use this for routine security upkeep and policy refresh. Link: Viewdocs/security/manifest-v2-migration.md - Purpose: migration guidance from legacy extension manifests to manifest v2 security fields. Bottom line: use this to upgrade manifests without losing capability/policy intent. Link: Viewdocs/security/operator-handbook.md - Purpose: comprehensive operator handbook for security operations. Bottom line: this is the primary security ops reference for day-to-day work. Link: Viewdocs/security/operator-quick-reference.md - Purpose: quick-reference card for security operators. Bottom line: use this cheat sheet for fast lookups during operations. Link: Viewdocs/security/policy-tuning-guide.md - Purpose: guide for tuning extension security policies and risk thresholds. Bottom line: policy calibration and adjustment procedures are here. Link: Viewdocs/security/runtime-hostcall-telemetry.md - Purpose: runtime hostcall telemetry design and event catalog. Bottom line: hostcall observability and event semantics are documented here. Link: Viewdocs/security/security-slos.md - Purpose: quantitative security SLOs, risk budgets, and release/rollback gates. Bottom line: security release readiness is numerically gated here. Link: Viewdocs/security/threat-model.md - Purpose: formal extension ecosystem threat model. Bottom line: this is the SEC-1.1 attacker and control baseline. Link: Viewdocs/session.md - Purpose: session model, persistence, and branching semantics. Bottom line: session behavior and storage contracts are here. Link: Viewdocs/settings.md - Purpose: settings schema, precedence, and config behavior. Bottom line: configuration behavior is canonicalized here. Link: Viewdocs/skills.md - Purpose: skills system usage and packaging guidance. Bottom line: extend prompt behavior through documented skill contracts. Link: Viewdocs/streaming-hostcalls.md - Purpose: streaming hostcall behavior and lifecycle details. Bottom line: use this to reason about streaming tool execution. Link: Viewdocs/terminal-setup.md - Purpose: terminal environment setup and ergonomics guidance. Bottom line: recommended terminal configuration is here. Link: Viewdocs/termux.md - Purpose: Termux-specific setup and runtime guidance. Bottom line: Android/Termux usage is documented here. Link: Viewdocs/test_double_inventory.json - Purpose: inventory of test doubles and mock surfaces. Bottom line: test double usage is auditable via this file. Link: Viewdocs/testing-policy.md - Purpose: testing policy, quality bars, and suite expectations. Bottom line: this is the normative test governance doc. Link: Viewdocs/themes.md - Purpose: theme system configuration and customization. Bottom line: terminal rendering themes are documented here. Link: Viewdocs/traceability_matrix.json - Purpose: requirement-to-test traceability matrix. Bottom line: evidence traceability across requirements lives here. Link: Viewdocs/tree.md - Purpose: session tree navigation and branching behavior guide. Bottom line: use this to understand conversation tree operations. Link: Viewdocs/troubleshooting.md - Purpose: general troubleshooting guide for common failures. Bottom line: start here for non-extension operational issues. Link: Viewdocs/tui.md - Purpose: TUI behavior, controls, and rendering notes. Bottom line: interactive UI semantics are defined here. Link: Viewdocs/windows.md - Purpose: Windows-specific installation and runtime guidance. Bottom line: Windows support details are centralized here. Link: Viewdocs/wit/extension.wit - Purpose: WIT interface definition for extension host contracts. Bottom line: typed extension host ABI is defined here. Link: ViewPlease don't take this the wrong way, but I do not accept outside contributions for any of my projects. I simply don't have the mental bandwidth to review anything, and it's my name on the thing, so I'm responsible for any problems it causes; thus, the risk-reward is highly asymmetric from my perspective. I'd also have to worry about other "stakeholders," which seems unwise for tools I mostly make for myself for free. Feel free to submit issues, and even PRs if you want to illustrate a proposed fix, but know I won't merge them directly. Instead, I'll have Claude or Codex review submissions via gh and independently decide whether and how to address them. Bug reports in particular are welcome. Sorry if this offends, but I want to avoid wasted time and hurt feelings. I understand this isn't in sync with the prevailing open-source ethos that seeks community contributions, but it's the only way I can move at this velocity and keep my sanity.
MIT License (with OpenAI/Anthropic Rider). See LICENSE for details.
Built with Rust, for developers who live in the terminal.