Disciplined Multi Coding Agent System
npx skills add https://github.com/liza-mas/liza --skill testingInstall this skill with the CLI and start using the SKILL.md workflow in your workspace.
Because "it worked in the demo" is not what on-call engineers are looking for.
The full hardening inventory to push to production with peace of mind.

Demo video (45min).
Liza is simultaneously a Pairing and Multi-Agent System (MAS)
optimized for doing things right on the first pass — with the auditability to prove it.
Liza bets on time-to-quality and durable codebase maintainability through automated reviews and documentation
(e.g. the ADR Backfill skill).
Liza's behavioral contract — used by both modes — makes models more thoughtful:
"I want to wash my car. The car wash is 100 meters away. Should I walk or drive?"
Sonnet 4.6: "Walk. Driving 100 meters to a car wash defeats the purpose — you'd barely get the car dirty enough to justify the trip, and parking/maneuvering takes longer than the walk itself."
Same with Liza's contract: "Drive. You're already going to a car wash — arriving dirty is the point."
Liza is a frontier Multi-Agent System:
Soufiane Keli – VP Software Engineering, Octo Technology (Accenture) – maps AI engineering maturity across 5 levels,
from autocomplete (L1) to software factory (L5, still theoretical). He places Liza at L4 – Collaborative Agent Networks:
"Multiple specialized agents work together on design, code, testing, and deployment. Humans orchestrate. This is typically
what's happening with BMAD, BEADS, and LIZA. Very few organizations have genuinely reached this level in 2026."
liza tui) displays live system state and lets you spawn agents, pause/resume, add tasks, and trigger checkpoints./liza-logs skill cross-correlates logs across agents to identify frictions — from misconfiguration in early setups to regressions from provider CLI updates in mature ones.recover-agent and recover-task commands for idempotent cleanup after hard crashesSee the complete vision and genesis of Liza.
Without the contract, an agent that hits a problem it can't solve has two options: admit failure or fake progress. Its training overwhelmingly favors the second. Faking progress feels collaborative — look, I'm trying things!
So it spirals. Random changes dressed up as hypotheses. Each iteration more elaborate, more confident, more wrong. You watch the diff grow and wonder if any of this is moving toward a solution. If you're clever, you end up reverting.
Under the contract, there's a third option: say "I'm stuck" and mean it. The contract makes that safe — no penalty for uncertainty, no pressure to perform progress. And the Approval Request mechanism forces agents to write down their reasoning before acting. "I'll try random things until something works" is hard to write in a structured plan. Surface the reasoning, and the reasoning improves — no better model required.
This won't self-correct. Sycophancy drives engagement — that's what gets optimized. Acting fast with little thinking controls inference costs. Model providers optimize for adoption and cost efficiency, not engineering reliability.
Ten months of pairing under this contract, and the vigilance tax dropped to near zero. I can mostly focus on the architecture and more specifically build up a MAS upon the contract.
Here is a demo video of an implementation of a basic Todo CLI
using Liza in Multi-agent mode - spec-driven with intermediate epic and User Story creation, fully autonomous agents within sprints, human reviews between sprints.
The multi-agent coding space splits into six categories:
| Liza | BMAD | CrewAI | Ruflo | Symphony | Paperclip | |
|---|---|---|---|---|---|---|
| Trust approach | Behavioral contract (55+ failure modes) | Prompt-level three-layer adversarial review (advisory) | Post-hoc output validation | Track-record based (Q-learning) | Implementation-dependent | Budget/approval governance |
| Review loop | Adversarial doer/reviewer pairs | 3 parallel reviewers (Blind Hunter / Edge Case / Acceptance) | Optional manager mode | None | None | None |
| Role enforcement | Code-enforced (Go supervisor) | Prompt-level (6 named personas) | Prompt suggestion | Claude hooks (provider-specific) | None (single-agent) | Org chart hierarchy |
| Failure handling | Structural prevention + escalation | bmad-correct-course + readiness gate (PASS/CONCERNS/FAIL) |
Retry on output failure | Pattern matching from past successes | Implementation-dependent | Budget auto-pause |
Where Liza leads — no competitor offers any of these:
Where others lead:
Spec-driven development is becoming the standard approach for AI coding. Most tools differ in what altitude they expect the input at and who owns product decisions.
| Liza | BMAD | Spec Kit | OpenSpec | Kiro | GSD | |
|---|---|---|---|---|---|---|
| Input level | High-level goal (problem, users, behavior, scope) | Full lifecycle (brainstorming → PRFAQ → PRD → Architecture → Stories) | High-level goal → agent-generated spec | Detailed delta-specs on existing system | Interactive 3-doc generation | Detailed spec required |
| Who decides what to build | Human via pairing (Coach/Challenger modes) | Human via conversational PM-agent interview | Agent generates, human approves | Human (spec pre-decided) | Agent drives, human confirms | Human (pre-written) |
| Decomposition | Orchestrator decomposes into adversarial tasks | Phase workflows produce artifacts (PRD → Architecture → Epics → Stories) | Agent decomposes spec into tasks | Slash commands structure tasks | Agent decomposes from spec | Planner sizes to context budget |
| Review | Doer/reviewer pairs with quorum | Three parallel reviewers at code stage (prompt-level, advisory) | None | Advisory (verify warns, doesn't block) | None (single-agent) | Checker + verifier (not adversarial) |
Most tools either expect the detailed spec already done (OpenSpec, GSD) or have the agent write it (Spec Kit, Kiro, MetaGPT). BMAD spans the broadest altitude range — from brainstorming and PRFAQ at the top through stories and code review at the bottom — but relies on the PM agent interviewing the human conversationally across every workflow. Liza treats goal-setting as a synchronous human-agent collaboration where the human makes product decisions and the agent helps surface gaps — then enforces those decisions mechanically during autonomous pipeline execution.
The positioning question is not "who starts highest" but "what's the minimum human input that reliably produces working code." BMAD answers with iterative PM-agent interviews; Liza answers with one front-loaded goal doc, then mechanical pipeline execution. A ~200-line goal document describing the "Diagnosis Design" method has been sufficient to produce a complete three-tier application (FastAPI backend, Go CLI, React web UI) in a single Liza run, with human intervention limited to answering questions (checkpoint-summary skill) between goal and merged code; the supporting run artifacts are in a non-public Diagnosis Design repo.
Rule of thumb: agents may make implementation choices but not product decisions. The goal document is where every product decision lives. The goal-setting phase uses pairing (Coach mode for surfacing WHY, Challenger mode for stress-testing WHAT) because this phase has the highest decision density — every ambiguity resolved here prevents wrong turns downstream.
install.sh)Liza provides a single executable: liza:
~/.local/bin (created automatically, no sudo needed).INSTALL_DIR environment variable to override./usr/local/bin, old binaries are removed automatically.Quick install (latest release, macOS/Linux):
curl -fsSL https://raw.githubusercontent.com/liza-mas/liza/main/install.sh | bash
Options:
# Specific version
curl -fsSL https://raw.githubusercontent.com/liza-mas/liza/main/install.sh | VERSION=v1.0.0 bash
# Build from a branch (requires Go and make)
curl -fsSL https://raw.githubusercontent.com/liza-mas/liza/main/install.sh | BRANCH=main bash
# Custom directory
curl -fsSL https://raw.githubusercontent.com/liza-mas/liza/main/install.sh | INSTALL_DIR=~/.local/bin bash
From a local clone:
git clone https://github.com/liza-mas/liza.git && cd liza
make install
Verify:
liza version
liza setup # initial install or liza upgrade: installs contracts + skills to ~/.liza/
# With: agent-specific activation (skill symlinks, contract config)
liza setup --claude --codex --gemini --mistral
️⚠️ Customize your tool setup:
The installed~/.liza/AGENT_TOOLS.mdships with a default
tool configuration. It defines which tools agents prefer (IDE integrations,
search providers, documentation sources, etc.) and is specific to each user's environment.
Context management is of paramount importance — see Recommended Tools below.
Edit~/.liza/AGENT_TOOLS.mdto match your own setup — remove tools you don't have,
add ones you do, and adjust precedence rules accordingly.
Or better, provide your own file at install time:liza setup --agent-tools ~/my-tools.md.
To init your project repo, do:
# Interactive wizard (recommended for first use):
liza init
# Or with explicit flags:
liza init --claude --codex --gemini --mistral
The interactive wizard walks through mode selection (pairing vs full MAS), agent selection, and handles existing CLAUDE.md conflicts automatically. Claude is fully automated; for other CLIs see contract activation for additional manual steps.
Claude environment overrides:
Create aclaude.envfile at your project root to inject environment variables into Claude CLI agent processes.
The supervisor reads this file automatically if it exists. Format:KEY=VALUE, one per line (comments with#).
See https://code.claude.com/docs/en/env-vars.# claude.env — example # Mitigate recent token usage spike, https://x.com/kunchenguid/status/2043511416448307378 CLAUDE_CODE_EFFORT_LEVEL=high CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 CLAUDE_CODE_SUBAGENT_MODEL=sonnet
New to Liza? Start with Pairing mode — it's the fastest way to experience how the behavioral contract changes agent behavior. The trust you build watching agents pause at gates, surface assumptions, and validate before claiming done is what makes letting them run autonomously in Multi-Agent mode a comfortable next step.
Pairing mode — install once, then start coding in any project (liza init still required per project):
When starting your CLI session (claude, codex, ...), pairing mode will be selected automatically.
It should start by displaying a canary test inspired by Van Halen's M&M's trick — Four words coming from four different contract files to show what the agent actually read thoroughly.
Reading the contract files is enforced by a hook for Claude, by instructions for other agents.
The agent reads the contract, builds mental models, and operates as a senior peer:
analyzing before acting, presenting approval requests at every state change, validating before claiming done.
Or you may choose to make it your Socratic colleague, your rubber duck, or your challenger.
Multi-agent mode — autonomous spec-to-code pipeline:
liza init "[Goal description]" --spec vision.md (this file needs to be committed) . Use the --entry-point detailed-spec option to skip the spec phase and go coding directly.liza tui — the TUI shows live system state (agents, tasks, alerts, sprint metrics). From it you can spawn agents with role autocompletion (s uses configured default CLI, S lets you pick). Pause/resume the system, add tasks, and trigger sprint checkpoints.Refer to How to Produce a Goal Document For Liza to write a good input doc to use as a --spec argument.
liza setup # One-time global setup
liza setup --agent-tools ~/my-tools.md # Custom AGENT_TOOLS.md
liza init "Project goal" --spec specs/vision.md # Initialize blackboard
liza init "Goal" --spec s.md \
--config pipeline.yaml --entry-point epic-planning # Pipeline-configured init
liza add-task --id t1 --desc "..." --spec "..." \
--done "..." --scope "..." # Add tasks
liza tui # Live TUI (spawn agents, monitor, manage)
liza agent coder # Start agent supervisor (or spawn from TUI)
liza validate # Validate state
liza get tasks # Query tasks
liza status # Dashboard overview
liza proceed # Transition between pipeline phases
liza pause / liza resume # Human intervention
liza stop / liza start # System control
liza sprint-checkpoint # Sprint checkpoint
liza recover-agent <id> # Crash recovery (agents)
liza recover-task <id> # Crash recovery (tasks)
liza analyze # Circuit breaker analysis
️⚠️ To use Claude Code with your Claude subscription, make sure the ANTHROPIC_API_KEY environment variable is not set by default on a new shell start (Claude support, not specific to Liza).
After your first sprint, run /liza-logs in any coding agent session to identify frictions. New users will typically find setup issues (missing tool permissions in AGENT_TOOLS.md, wrong --post-worktree-cmd, stale ~/.liza/ files). Seasoned users use it to catch regressions — provider CLI updates that break flags, context budget growth from prompt changes, or new tool failure patterns. See Analyzing Agent Logs for details.
Liza optimizes cost-to-quality, not cost-to-lets-cross-fingers. These tools reduce token usage without sacrificing output quality:
| Tool | What it does | Impact |
|---|---|---|
| RTK | CLI proxy that compresses tool output (git, go, pytest, ...) — ~90% token savings on command results | Fewer tokens per tool call, more budget for reasoning |
| MorphLLM MCP (WarpGrep) | Fast Apply edits via // ... existing code ... placeholders + semantic codebase search |
Avoids reading full files into context for edits |
| claude-usage | Tracks Claude subscription usage with cost breakdown | Visibility into where tokens go — essential for optimizing agent configurations |
Configure tool preferences in ~/.liza/AGENT_TOOLS.md (see installation notes above).
.claudeignore — Claude Code reads all files on disk, including git-tracked ones it doesn't need. Add a .claudeignore at your project root (same syntax as .gitignore) to keep irrelevant content out of the context budget. Liza ships one by default; review and adapt it to your project. Common candidates:
claude.env, .mcp.json, build caches, backup directoriespackage-lock.json, go.sum), generated changelogs, historical SQL migrationsdocs/ that duplicates what Claude can infer from sourceMost spec-driven multi-agent systems are LLM-all-the-way-down: agents coordinating agents, with compliance dependent on
prompt adherence and artifact-based workflows.
Liza is a hybrid system:
Reliability is built into every component.
graph TB
H["User"] -->|commands| CLI["Go CLI · <i>liza</i>"]
AP["Doer / Reviewer LLM Agent Pairs · <small>judgment layer</small>"]
CLI -->|spawns| S["Supervisor · <small>deterministic Go</small>"]
CLI --> BB["YAML Blackboard<br><small>state.yaml</small>"]
CLI --> WT["Git Worktrees<br><small>isolated workspaces</small>"]
S -->|wraps| AP
PL["YAML Pipeline & Roles"] --> |specializes| S
S --> PB
BC["Behavioral Contract"] -->|harness| AP
PB["Prompt Builder"] -->|bootstrap prompt| AP
SK["Skills"] -->|empowers| AP
SP["Specs"] <-->|drives / produces| AP
AP -->|calls| CLI
style CLI fill:#4a90d9,stroke:#2c5ea0,color:#fff
style S fill:#4a90d9,stroke:#2c5ea0,color:#fff
style AP fill:#e8833a,stroke:#c0652a,color:#fff
style PB fill:#5bb87d,stroke:#3d8a5a,color:#fff
style BC fill:#5bb87d,stroke:#3d8a5a,color:#fff
style SK fill:#5bb87d,stroke:#3d8a5a,color:#fff
style SP fill:#5bb87d,stroke:#3d8a5a,color:#fff
style BB fill:#b0b8c4,stroke:#8a929e,color:#333
style WT fill:#b0b8c4,stroke:#8a929e,color:#333
style PL fill:#b0b8c4,stroke:#8a929e,color:#333
Roles aren't composable, Skills are: agents aren't constrained regarding their capabilities by a rigid "Act as a..." prompt
and may use any skill they consider relevant to adapt to the situation.
Liza has the built-in capability to do things right on the first pass.
Liza has 13 roles organized in three pipeline phases:
┌─────────────────────────────────────────────────────────────┐
│ Human │
│ (leads specs, observes terminals, reads blackboard, │
│ kills agents, pauses system) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────── Specification Phase ──────────┐
│ │
│ Orchestrator (decomposes & rescopes) │
│ Epic Planner ←→ Epic Plan Reviewer │
│ US Writer ←→ US Reviewer │
│ │
└──────────────────┬───────────────────────┘
│ liza proceed (us-to-coding, many-to-one)
┌──────────── Coding Phase ────────────────┐
│ │
│ Orchestrator (decomposes & rescopes) │
│ Architect ←→ Architecture Reviewer │
│ Code Planner ←→ Code Plan Reviewer │
│ Coder ←→ Code Reviewer │
│ │
└──────────────────┬───────────────────────┘
│ all coding tasks merged
┌──────────── Integration Phase ───────────┐
│ │
│ Integration Analyst ←→ Integration Rev. │
│ (findings → fix tasks in coding-pair) │
│ │
└──────────────────┬───────────────────────┘
│
▼
┌─────────────────┐
│ .liza/ │
│ state.yaml │ ← blackboard
│ log.yaml │ ← activity history
│ alerts.log │ ← watch daemon output
│ archive/ │ ← terminal-state tasks
└─────────────────┘
│
▼
┌─────────────────┐
│ .worktrees/ │
│ task-1/ │ ← isolated workspaces
│ task-2/ │
└─────────────────┘
See Architecture and C4 Diagrams.
Each role pair follows the same intra-pair flow (concrete state names are role-pair-specific, e.g. DRAFT_CODE, IMPLEMENTING_CODE):
initial → executing → submitted → reviewing → approved → MERGED
│ ↑ ↓ │
│ └────── rejected ──────┘ │
│ ↓
├──> BLOCKED INTEGRATION_FAILED
│ ├──> SUPERSEDED
│ └──> ABANDONED
│
└──> initial (release claim)
Inter-pair transitions (liza proceed) create downstream tasks between sprints:
Spec phase Coding phase
Epic Planner ─approved─► MERGED Architect ─approved─► MERGED
│ epic-to-us (per-subtask) │ arch-to-code-plan (per-subtask)
▼ ▼
US Writer ─approved─► MERGED Code Planner ─approved─► MERGED
│ us-to-coding (many-to-one) │ code-plan-to-coding (per-subtask)
▼ ▼
Architect (coding phase) Coder ─approved─► MERGED
│ all tasks merged
▼
Integration Analyst (auto)
Example of a task on the blackboard:
- id: code-planning-1-code-3
type: coding
role_pair: coding-pair
description: Role infrastructure recognizes the 4 new roles with correct runtime/workflow mapping.
status: MERGED
priority: 1
assigned_to: coder-2
base_commit: e7625ed69318836dd495b22855df3a8b91fe32b5
iteration: 1
review_commit: 9d9254b893af477fc34f48063169634d200fa332
approved_by: code-reviewer-1
merge_commit: 2fa6399223262df6a87c6b1354dfc882b73114c5
lease_expires: 2026-03-06T01:47:22.075108537Z
spec_ref: specs/plans/sub-pipelines-phase2.md
done_when: ToWorkflow("epic-planner") returns "epic_planner" (and all 4 pairs); IsValidRuntime("us-writer") returns true; AllRuntime() returns 9 roles; Tests pass
scope: internal/roles/roles.go, internal/roles/roles_test.go, internal/models/state.go
created: 2026-03-06T01:17:00.99638669Z
history:
- time: 2026-03-06T01:17:22.075108537Z
event: claimed
agent: coder-2
- time: 2026-03-06T01:19:30.131578505Z
event: pre_execution_checkpoint
agent: coder-2
files_to_modify:
- internal/roles/roles.go
- internal/roles/roles_test.go
- internal/models/state.go
intent: Add 4 new role constants (epic-planner, epic-plan-reviewer, us-writer, us-reviewer) with runtime↔workflow mapping, update AllRuntime()/AllWorkflow() to return 9 roles, and add Role* aliases in models/state.go.
validation_plan: 'Run `go test ./internal/roles/ ./internal/models/` in worktree. Verify: ToWorkflow("epic-planner")→"epic_planner" for all 4 new roles, IsValidRuntime("us-writer")→true, AllRuntime() returns 9 roles.'
- time: 2026-03-06T01:22:05.371651393Z
event: submitted_for_review
agent: coder-2
- time: 2026-03-06T01:24:30.366073081Z
event: approved
agent: code-reviewer-1
- time: 2026-03-06T03:06:35.560908548+01:00
event: merged
agent: code-reviewer-1
commit: 2fa6399223262df6a87c6b1354dfc882b73114c5
tests_ran: false
See Release Notes for version history and RELEASE.md for maintainer release workflow.
Where Liza works today:
Liza is a collaborative agent network (L4 AI maturity) but its architecture has been designed to support a software factory (L5) where humans focus on strategy and product vision. Still a long way to go.
Implemented roles:
Planned role pairs:
Roadmap:
The contract is a capability test. It requires meta-cognitive machinery—the ability to parse instructions as executable specifications, observe state, pause at gates.
| Provider | Classification | Notes |
|---|---|---|
| Claude Opus 4.x | Fully compatible | Reference provider |
| GPT-5.x-Codex | Fully compatible | Equally capable |
| Kimi 2.5 | Compatible but poor on real-world tasks | Responsive to tooling feedback |
| Mistral Devstral-2 | Partial | Requires explicit activation and supervision |
| Gemini 2.5 Flash | Incompatible | Architectural limitation—no prompt-level fix |
See Model Capability Assessment for detailed analysis.
Liza combines two references:
Lisa Simpson—the disciplined, systematic counterpoint to Ralph Wiggum. The Ralph Wiggum technique loops agents until they converge through persistence. Lisa makes sure the work is actually right.
ELIZA—the 1966 chatbot that demonstrated structured dialogue patterns. Liza is about structured collaboration patterns: explicit states, binding verdicts, auditable transitions.
Liza doesn't make agents smarter. It makes them accountable.
Apache 2.0
The behavioral contract draws on research into LLM failure modes, sycophancy patterns, and code generation failures. The multi-agent design incorporates ideas from: