xiaodazi

Open-source AI agent that lives on your desktop.
Local-first storage · 200+ plug-and-play skills · 7+ LLM providers · macOS & Windows

Official Website | 中文 | English

Get started in 3 steps

What is xiaodazi?

xiaodazi ("little buddy") is an open-source AI agent that runs as a native desktop app (Tauri). It keeps all data on your machine, operates your computer directly — managing files, automating apps, generating documents — and remembers your preferences across sessions.

Demo

Why xiaodazi?

5 core advantages

	Cloud AI Assistants	xiaodazi
Data	Stored on provider's servers	100% local (SQLite, plain files)
Memory	Forgets between sessions	Remembers preferences via editable `MEMORY.md` + semantic search
Skills	Fixed capabilities	200+ plug-and-play skills, add new ones by writing Markdown
Models	Locked to one provider	Switch between Claude, GPT, Qwen, DeepSeek, Gemini, GLM, or Ollama
Errors	Fails silently or retries	Classifies errors, backtracks from bad approaches, degrades gracefully

Quick Start

Option A: One-click install (end users)

Windows — Download the installer from Releases, double-click to run.

macOS — Open Terminal and run:

bash <(curl -fsSL https://raw.githubusercontent.com/malue-ai/dazee-small/main/scripts/auto_build_app.sh)

Then configure an API key in the Settings page (DeepSeek or Gemini free tier recommended for beginners).

Option B: From source (developers)

git clone https://github.com/malue-ai/dazee-small.git
cd dazee-small

python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Create config.yaml in the project root (or use the Settings page after starting):

api_keys:
  ANTHROPIC_API_KEY: sk-ant-api03-your-key-here   # Recommended
  # OPENAI_API_KEY: sk-xxx
  # DASHSCOPE_API_KEY: sk-xxx                      # Qwen
  # GEMINI_API_KEY: xxx                             # Gemini (free: 1500 req/day)

Start the backend:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Start the frontend:

cd frontend
npm install && npm run dev
# Open http://localhost:5174

Desktop app (Tauri) — requires Rust toolchain

# Install Rust: https://rustup.rs/
# Requires Node.js >= 18 (Vite 5 requirement)

# 1. Build backend sidecar binary (requires PyInstaller)
pip install pyinstaller
python scripts/build_backend.py

# 2. Start Tauri dev / build
cd frontend
npm run tauri:dev     # Development
npm run tauri:build   # Production build

Fully offline with Ollama

Install Ollama, then set in config.yaml:

llm:
  COT_AGENT_MODEL: ollama/llama3.1

No API key needed. All inference runs locally.

Key Design Decisions

LLM-First — No keyword matching, ever

All semantic tasks — intent classification, skill selection, complexity inference, backtrack decisions — are performed by the LLM. Hard-coded rules exist only for format validation, numeric calculations, and security boundaries.

Why it matters: When a user says "Don't make a PPT, just give me the key points", a keyword system matches "PPT" and loads the wrong tools. xiaodazi's LLM-driven intent analysis correctly loads zero PPT skills.

Skills as Markdown — 200+ and growing

Each skill is a directory with a SKILL.md file. No Python code required for most skills — the LLM reads the instructions and uses built-in tools to execute them.

Despite 200+ skills, zero are loaded by default. Each request activates only the skill groups matching the user's intent (typically 0–15 out of 200+). A simple "hi" costs 0 skill tokens; a complex research task costs ~1,200.

Command Execution Safety — Transparent rules for what AI can and cannot do

xiaodazi can execute commands directly on your computer (file operations, network diagnostics, script execution, etc.), but every command passes through a built-in security policy engine. Rules are evaluated top-to-bottom; the first match wins. Unmatched commands default to allow.

✅ Allowed commands

Pattern	Description	Shell
`echo *`	Echo output	All
`Get-*`	PowerShell read-only query cmdlets	powershell / pwsh
`dir` / `dir *`	Directory listing	All
`hostname`	Hostname query	All
`whoami`	Current user	All
`systeminfo`	System information	All
`ipconfig` / `ipconfig *`	Network configuration	All
`ping *`	Network connectivity test	All
`type *`	Read file contents	cmd
`cat *`	Read file contents	All
`tasklist*`	Process list	All
`netstat*`	Network connections	All
`python ` / `python3 `	Python script execution	All
`pip ` / `pip3 `	Python package management	All
`git *`	Git version control	All
`node ` / `npm `	Node.js runtime & package management	All

🚫 Explicitly blocked dangerous commands

Pattern	Reason
`Remove-Item ` / `rm ` / `del *`	Prevent accidental file deletion
`Format-*`	Prevent disk formatting
`Stop-Computer` / `shutdown`	Prevent unexpected shutdown
`Restart-Computer*`	Prevent unexpected restart
`Invoke-WebRequest`	Prevent downloading and executing unknown programs
`Start-Process`	Prevent bypassing security policy to launch processes
`reg `	Prevent registry modification
`net user` / `net localgroup`	Prevent account and permission tampering
`schtasks *`	Prevent scheduled task creation

Customizable: Policy rules are stored in a local exec-policy.json file. You can modify them via remote management commands (system.execApprovals.get/set) or by editing the JSON file directly. Adjust the allowlist and blocklist to fit your use case.

Local-First — Your data stays on your machine

Storage	Technology	Purpose
Messages & conversations	SQLite (WAL mode)	Async read/write, concurrent access
Full-text search	SQLite FTS5	BM25 ranking, zero-config
Semantic vectors	sqlite-vec (optional)	Vector similarity, single file
User memory	`MEMORY.md`	Plain text, user-editable
File attachments	Local filesystem	Instance-isolated

No cloud database, no external vector store, no third-party analytics. LLM inference uses cloud APIs by default, with full local model support via Ollama for completely offline operation.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│  Layer 1 — User Interface                                                   │
│    Tauri 2.10 (Rust) · Vue 3.4 + TypeScript · Apple Liquid Design           │
├─────────────────────────────────────────────────────────────────────────────┤
│  Layer 2 — API & Services                                                   │
│    FastAPI (REST + SSE + WebSocket) · Multi-Channel Gateway                  │
├─────────────────────────────────────────────────────────────────────────────┤
│  Layer 3 — Agent Engine                                                     │
│    Intent Analyzer (LLM, 4-layer cache, <200ms)                              │
│    RVR-B Executor (React → Validate → Reflect → Backtrack)                   │
│    Context Engineering (3-phase inject, KV-Cache 90%+ hit, scratchpad)       │
│    Plan Manager (DAG tasks, real-time progress UI)                           │
├──────────────────────────────┬──────────────────────────────────────────────┤
│  Layer 4 — Capability        │  Layer 5 — Infrastructure                    │
│    200+ Skills (20 groups)   │    7 LLM Providers + Ollama                   │
│    Tool System (intent-      │    SQLite + FTS5 + sqlite-vec                 │
│      pruned)                 │    Instance Isolation                         │
│    3-Layer Memory            │    3-Layer Evaluation                         │
│    Playbook Learning         │                                               │
└──────────────────────────────┴──────────────────────────────────────────────┘

Lifecycle of a request: User message → Intent analysis (<200ms, cached) → Skill & tool selection → RVR-B execution loop (stream tokens, call tools, validate, backtrack if needed) → Memory extraction → Response complete.

What makes xiaodazi different from typical agent frameworks?

Capability	xiaodazi	Typical frameworks
Intent analysis	LLM semantic analysis per request (4-layer cache, <200ms). Adjusts skill loading, planning depth, and token budget per request.	Route by session or fixed config. Same resource allocation for every request.
Error recovery	RVR-B loop: classify error → backtrack from wrong approaches → clean context pollution → degrade gracefully with partial results.	Retry + model failover. Solves infra failures, not strategy failures.
Context management	Proactive: 3-phase injection, progressive history decay, scratchpad file exchange (100x compression), KV-Cache optimization (90%+ hit).	Reactive: truncate or summarize when context overflows.
Skill loading	0 skills loaded by default. Intent-driven lazy allocation. Token cost scales with task complexity, not library size.	Load all capabilities upfront, or manual tool selection.
Planning	Explicit DAG plan with UI progress widget and re-planning on failure.	Implicit chain-of-thought. No visibility, no recovery.
Evaluation	3-layer grading (code + LLM-as-Judge + human), 12-type failure classification, auto-regression.	External eval tools or manual testing.
Learning	Playbook system: extract strategy → user confirms → apply to future tasks.	No built-in learning loop.

Tech Stack

Layer	Technology
Desktop shell	Tauri 2.10 (Rust)
Frontend	Vue 3.4 + TypeScript + Tailwind CSS 4.1 + Pinia
Backend	Python 3.12 + FastAPI + asyncio
Communication	SSE + WebSocket + REST
Storage	SQLite (WAL) + FTS5 + sqlite-vec
LLM providers	Claude, OpenAI, Qwen, DeepSeek, Gemini, GLM, Ollama
Memory	MEMORY.md + FTS5 + Mem0
Evaluation	Code graders + LLM-as-Judge + human review

Project Structure

Click to expand

xiaodazi/
├── frontend/            # Vue 3 + Tauri desktop app
├── core/
│   ├── agent/           # RVR-B execution, backtracking
│   ├── routing/         # LLM-First intent analysis
│   ├── context/         # Context engineering (inject, compress, cache)
│   ├── tool/            # Tool registry, selector, executor
│   ├── skill/           # Skill loader, group registry
│   ├── memory/          # 3-layer memory (Markdown + FTS5 + Mem0)
│   ├── playbook/        # Online learning (strategy extraction)
│   ├── llm/             # 7 LLM providers + format adapters
│   ├── planning/        # DAG task planning + progress tracking
│   ├── termination/     # Adaptive termination strategies
│   ├── state/           # Snapshot / rollback
│   └── monitoring/      # Failure detection, token audit
├── routers/             # FastAPI HTTP/WS endpoints
├── services/            # Business logic (protocol-agnostic)
├── tools/               # Built-in tool implementations
├── skills/              # Shared skill library
├── instances/           # Agent instance configs
├── evaluation/          # E2E test suites + graders
├── models/              # Pydantic data models
└── infra/               # Storage infrastructure (SQLite, cache)

Extending xiaodazi

Add a Skill (no code required)

Create a directory under skills/ or instances/xiaodazi/skills/ with a SKILL.md:

# My Custom Skill

## When to Use
When the user asks to [describe the trigger scenario].

## Instructions
1. First, [step one]
2. Then, [step two]
3. Finally, [step three]

## metadata
os_compatibility: common
dependency_level: builtin

The skill is automatically discovered, classified, and available on next request.

Add an LLM Provider

Implement a provider class in core/llm/ following the existing adapters (Claude, OpenAI, Qwen, etc.). Register it in the LLMRegistry.

Add a Messaging Channel

Implement a gateway adapter in core/gateway/. The ChatService is protocol-agnostic — your adapter only handles message format conversion.

Known Issues

We are honest about what doesn't work well yet.

Stability

Long session memory pressure — In 80+ turn conversations with heavy tool usage, context compression occasionally discards information the agent needs later.
Process crashes — The Python backend can exit unexpectedly under concurrent file writes. The Tauri shell does not yet auto-restart the sidecar.
SQLite write contention — When memory extraction, conversation save, and playbook extraction fire simultaneously, database is locked errors occur occasionally on slower disks.

Agent Quality

Backtracking timing — The RVR-B loop sometimes backtracks too late or too eagerly. Error classification thresholds are still being calibrated.
Planning granularity — Plans are sometimes too coarse or too fine. The complexity-to-depth mapping needs more real-world data.

Platform

macOS is the primary test platform. Windows support exists but has received less testing.
Single-machine only. No remote access, no mobile app, no multi-device sync.
Text only. No voice input/output yet.

Roadmap

[ ] Windows platform hardening
[ ] Sidecar auto-restart and health monitoring
[ ] Importance-aware context compression
[ ] Skill marketplace / community registry
[ ] Parallel tool execution
[ ] Voice input/output
[ ] Additional messaging channels (Discord, Slack, WhatsApp)

Documentation

Document	Description
Architecture Overview	Full 5-layer architecture with 12 deep-dive modules
Frontend & Desktop	Tauri + Vue 3, Apple Liquid design
API & Services	Three-layer architecture, preprocessing pipeline
Intent Analysis	LLM-First semantic analysis, 4-layer caching
Agent Execution	RVR-B loop, backtracking, adaptive termination
Context Engineering	3-phase injection, compression, KV-Cache
Tool System	2-layer registry, intent-driven pruning
Skill Ecosystem	200+ skills, 2D classification, lazy allocation
Memory System	3-layer memory, dual-write, fusion search
LLM Multi-Model	7 providers, format adapters, failover
Instance & Config	Prompt-driven schema, instance isolation
Evaluation	3-layer grading, E2E pipeline, failure detection
Playbook Learning	Closed-loop strategy learning

Contributing

We welcome contributions of all kinds:

Skill authoring — The lowest-barrier way to contribute. Write a SKILL.md, open a PR.
Bug reports — Especially on Windows. Every crash report improves the project.
Prompt tuning — Help improve intent analysis accuracy or agent response quality.
Documentation — Tutorials, examples, translations.
Code — See the Architecture docs to understand the codebase before diving in.

Star History

Contributors

Authors

Yi Liu (@ironliuyi) — [email protected]
Kangcheng Wang — [email protected]
Mengqi Zeng — [email protected]
Haipeng Xie — [email protected]

pywinauto