pywinauto

설치
CLI
npx skills add https://github.com/malue-ai/dazee-small --skill pywinauto

CLI를 사용하여 이 스킬을 설치하고 작업 공간에서 SKILL.md 워크플로 사용을 시작하세요.

최근 업데이트: 4/24/2026

xiaodazi logo

xiaodazi

Open-source AI agent that lives on your desktop.
Local-first storage · 200+ plug-and-play skills · 7+ LLM providers · macOS & Windows

Official Website License: MIT Python 3.10+ GitHub Stars

Official Website | 中文 | English

Get started in 3 steps


What is xiaodazi?

xiaodazi ("little buddy") is an open-source AI agent that runs as a native desktop app (Tauri). It keeps all data on your machine, operates your computer directly — managing files, automating apps, generating documents — and remembers your preferences across sessions.

Demo

Why xiaodazi?

5 core advantages

Cloud AI Assistants xiaodazi
Data Stored on provider's servers 100% local (SQLite, plain files)
Memory Forgets between sessions Remembers preferences via editable MEMORY.md + semantic search
Skills Fixed capabilities 200+ plug-and-play skills, add new ones by writing Markdown
Models Locked to one provider Switch between Claude, GPT, Qwen, DeepSeek, Gemini, GLM, or Ollama
Errors Fails silently or retries Classifies errors, backtracks from bad approaches, degrades gracefully

Quick Start

Option A: One-click install (end users)

Windows — Download the installer from Releases, double-click to run.

macOS — Open Terminal and run:

bash <(curl -fsSL https://raw.githubusercontent.com/malue-ai/dazee-small/main/scripts/auto_build_app.sh)

Then configure an API key in the Settings page (DeepSeek or Gemini free tier recommended for beginners).

Option B: From source (developers)

git clone https://github.com/malue-ai/dazee-small.git
cd dazee-small

python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Create config.yaml in the project root (or use the Settings page after starting):

api_keys:
  ANTHROPIC_API_KEY: sk-ant-api03-your-key-here   # Recommended
  # OPENAI_API_KEY: sk-xxx
  # DASHSCOPE_API_KEY: sk-xxx                      # Qwen
  # GEMINI_API_KEY: xxx                             # Gemini (free: 1500 req/day)

Start the backend:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Start the frontend:

cd frontend
npm install && npm run dev
# Open http://localhost:5174
Desktop app (Tauri) — requires Rust toolchain
# Install Rust: https://rustup.rs/
# Requires Node.js >= 18 (Vite 5 requirement)

# 1. Build backend sidecar binary (requires PyInstaller)
pip install pyinstaller
python scripts/build_backend.py

# 2. Start Tauri dev / build
cd frontend
npm run tauri:dev     # Development
npm run tauri:build   # Production build
Fully offline with Ollama

Install Ollama, then set in config.yaml:

llm:
  COT_AGENT_MODEL: ollama/llama3.1

No API key needed. All inference runs locally.


Key Design Decisions

LLM-First — No keyword matching, ever

All semantic tasks — intent classification, skill selection, complexity inference, backtrack decisions — are performed by the LLM. Hard-coded rules exist only for format validation, numeric calculations, and security boundaries.

Why it matters: When a user says "Don't make a PPT, just give me the key points", a keyword system matches "PPT" and loads the wrong tools. xiaodazi's LLM-driven intent analysis correctly loads zero PPT skills.

Skills as Markdown — 200+ and growing

Each skill is a directory with a SKILL.md file. No Python code required for most skills — the LLM reads the instructions and uses built-in tools to execute them.

Despite 200+ skills, zero are loaded by default. Each request activates only the skill groups matching the user's intent (typically 0–15 out of 200+). A simple "hi" costs 0 skill tokens; a complex research task costs ~1,200.

Command Execution Safety — Transparent rules for what AI can and cannot do

xiaodazi can execute commands directly on your computer (file operations, network diagnostics, script execution, etc.), but every command passes through a built-in security policy engine. Rules are evaluated top-to-bottom; the first match wins. Unmatched commands default to allow.

✅ Allowed commands
Pattern Description Shell
echo * Echo output All
Get-* PowerShell read-only query cmdlets powershell / pwsh
dir / dir * Directory listing All
hostname Hostname query All
whoami Current user All
systeminfo System information All
ipconfig / ipconfig * Network configuration All
ping * Network connectivity test All
type * Read file contents cmd
cat * Read file contents All
tasklist* Process list All
netstat* Network connections All
python * / python3 * Python script execution All
pip * / pip3 * Python package management All
git * Git version control All
node * / npm * Node.js runtime & package management All
🚫 Explicitly blocked dangerous commands
Pattern Reason
Remove-Item * / rm * / del * Prevent accidental file deletion
Format-* Prevent disk formatting
Stop-Computer* / shutdown* Prevent unexpected shutdown
Restart-Computer* Prevent unexpected restart
*Invoke-WebRequest* Prevent downloading and executing unknown programs
*Start-Process* Prevent bypassing security policy to launch processes
*reg * Prevent registry modification
net user* / net localgroup* Prevent account and permission tampering
schtasks * Prevent scheduled task creation

Customizable: Policy rules are stored in a local exec-policy.json file. You can modify them via remote management commands (system.execApprovals.get/set) or by editing the JSON file directly. Adjust the allowlist and blocklist to fit your use case.

Local-First — Your data stays on your machine

Storage Technology Purpose
Messages & conversations SQLite (WAL mode) Async read/write, concurrent access
Full-text search SQLite FTS5 BM25 ranking, zero-config
Semantic vectors sqlite-vec (optional) Vector similarity, single file
User memory MEMORY.md Plain text, user-editable
File attachments Local filesystem Instance-isolated

No cloud database, no external vector store, no third-party analytics. LLM inference uses cloud APIs by default, with full local model support via Ollama for completely offline operation.


Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│  Layer 1 — User Interface                                                   │
│    Tauri 2.10 (Rust) · Vue 3.4 + TypeScript · Apple Liquid Design           │
├─────────────────────────────────────────────────────────────────────────────┤
│  Layer 2 — API & Services                                                   │
│    FastAPI (REST + SSE + WebSocket) · Multi-Channel Gateway                  │
├─────────────────────────────────────────────────────────────────────────────┤
│  Layer 3 — Agent Engine                                                     │
│    Intent Analyzer (LLM, 4-layer cache, <200ms)                              │
│    RVR-B Executor (React → Validate → Reflect → Backtrack)                   │
│    Context Engineering (3-phase inject, KV-Cache 90%+ hit, scratchpad)       │
│    Plan Manager (DAG tasks, real-time progress UI)                           │
├──────────────────────────────┬──────────────────────────────────────────────┤
│  Layer 4 — Capability        │  Layer 5 — Infrastructure                    │
│    200+ Skills (20 groups)   │    7 LLM Providers + Ollama                   │
│    Tool System (intent-      │    SQLite + FTS5 + sqlite-vec                 │
│      pruned)                 │    Instance Isolation                         │
│    3-Layer Memory            │    3-Layer Evaluation                         │
│    Playbook Learning         │                                               │
└──────────────────────────────┴──────────────────────────────────────────────┘

Lifecycle of a request: User message → Intent analysis (<200ms, cached) → Skill & tool selection → RVR-B execution loop (stream tokens, call tools, validate, backtrack if needed) → Memory extraction → Response complete.

What makes xiaodazi different from typical agent frameworks?
Capability xiaodazi Typical frameworks
Intent analysis LLM semantic analysis per request (4-layer cache, <200ms). Adjusts skill loading, planning depth, and token budget per request. Route by session or fixed config. Same resource allocation for every request.
Error recovery RVR-B loop: classify error → backtrack from wrong approaches → clean context pollution → degrade gracefully with partial results. Retry + model failover. Solves infra failures, not strategy failures.
Context management Proactive: 3-phase injection, progressive history decay, scratchpad file exchange (100x compression), KV-Cache optimization (90%+ hit). Reactive: truncate or summarize when context overflows.
Skill loading 0 skills loaded by default. Intent-driven lazy allocation. Token cost scales with task complexity, not library size. Load all capabilities upfront, or manual tool selection.
Planning Explicit DAG plan with UI progress widget and re-planning on failure. Implicit chain-of-thought. No visibility, no recovery.
Evaluation 3-layer grading (code + LLM-as-Judge + human), 12-type failure classification, auto-regression. External eval tools or manual testing.
Learning Playbook system: extract strategy → user confirms → apply to future tasks. No built-in learning loop.

Tech Stack

Layer Technology
Desktop shell Tauri 2.10 (Rust)
Frontend Vue 3.4 + TypeScript + Tailwind CSS 4.1 + Pinia
Backend Python 3.12 + FastAPI + asyncio
Communication SSE + WebSocket + REST
Storage SQLite (WAL) + FTS5 + sqlite-vec
LLM providers Claude, OpenAI, Qwen, DeepSeek, Gemini, GLM, Ollama
Memory MEMORY.md + FTS5 + Mem0
Evaluation Code graders + LLM-as-Judge + human review

Project Structure

Click to expand
xiaodazi/
├── frontend/            # Vue 3 + Tauri desktop app
├── core/
│   ├── agent/           # RVR-B execution, backtracking
│   ├── routing/         # LLM-First intent analysis
│   ├── context/         # Context engineering (inject, compress, cache)
│   ├── tool/            # Tool registry, selector, executor
│   ├── skill/           # Skill loader, group registry
│   ├── memory/          # 3-layer memory (Markdown + FTS5 + Mem0)
│   ├── playbook/        # Online learning (strategy extraction)
│   ├── llm/             # 7 LLM providers + format adapters
│   ├── planning/        # DAG task planning + progress tracking
│   ├── termination/     # Adaptive termination strategies
│   ├── state/           # Snapshot / rollback
│   └── monitoring/      # Failure detection, token audit
├── routers/             # FastAPI HTTP/WS endpoints
├── services/            # Business logic (protocol-agnostic)
├── tools/               # Built-in tool implementations
├── skills/              # Shared skill library
├── instances/           # Agent instance configs
├── evaluation/          # E2E test suites + graders
├── models/              # Pydantic data models
└── infra/               # Storage infrastructure (SQLite, cache)

Extending xiaodazi

Add a Skill (no code required)

Create a directory under skills/ or instances/xiaodazi/skills/ with a SKILL.md:

# My Custom Skill

## When to Use
When the user asks to [describe the trigger scenario].

## Instructions
1. First, [step one]
2. Then, [step two]
3. Finally, [step three]

## metadata
os_compatibility: common
dependency_level: builtin

The skill is automatically discovered, classified, and available on next request.

Add an LLM Provider

Implement a provider class in core/llm/ following the existing adapters (Claude, OpenAI, Qwen, etc.). Register it in the LLMRegistry.

Add a Messaging Channel

Implement a gateway adapter in core/gateway/. The ChatService is protocol-agnostic — your adapter only handles message format conversion.


Known Issues

We are honest about what doesn't work well yet.

Stability
  • Long session memory pressure — In 80+ turn conversations with heavy tool usage, context compression occasionally discards information the agent needs later.
  • Process crashes — The Python backend can exit unexpectedly under concurrent file writes. The Tauri shell does not yet auto-restart the sidecar.
  • SQLite write contention — When memory extraction, conversation save, and playbook extraction fire simultaneously, database is locked errors occur occasionally on slower disks.
Agent Quality
  • Backtracking timing — The RVR-B loop sometimes backtracks too late or too eagerly. Error classification thresholds are still being calibrated.
  • Planning granularity — Plans are sometimes too coarse or too fine. The complexity-to-depth mapping needs more real-world data.
Platform
  • macOS is the primary test platform. Windows support exists but has received less testing.
  • Single-machine only. No remote access, no mobile app, no multi-device sync.
  • Text only. No voice input/output yet.

Roadmap

  • [ ] Windows platform hardening
  • [ ] Sidecar auto-restart and health monitoring
  • [ ] Importance-aware context compression
  • [ ] Skill marketplace / community registry
  • [ ] Parallel tool execution
  • [ ] Voice input/output
  • [ ] Additional messaging channels (Discord, Slack, WhatsApp)

Documentation

Document Description
Architecture Overview Full 5-layer architecture with 12 deep-dive modules
Frontend & Desktop Tauri + Vue 3, Apple Liquid design
API & Services Three-layer architecture, preprocessing pipeline
Intent Analysis LLM-First semantic analysis, 4-layer caching
Agent Execution RVR-B loop, backtracking, adaptive termination
Context Engineering 3-phase injection, compression, KV-Cache
Tool System 2-layer registry, intent-driven pruning
Skill Ecosystem 200+ skills, 2D classification, lazy allocation
Memory System 3-layer memory, dual-write, fusion search
LLM Multi-Model 7 providers, format adapters, failover
Instance & Config Prompt-driven schema, instance isolation
Evaluation 3-layer grading, E2E pipeline, failure detection
Playbook Learning Closed-loop strategy learning

Contributing

We welcome contributions of all kinds:

  • Skill authoring — The lowest-barrier way to contribute. Write a SKILL.md, open a PR.
  • Bug reports — Especially on Windows. Every crash report improves the project.
  • Prompt tuning — Help improve intent analysis accuracy or agent response quality.
  • Documentation — Tutorials, examples, translations.
  • Code — See the Architecture docs to understand the codebase before diving in.

Star History

Star History Chart

Contributors

Authors

License

MIT — Copyright (c) 2025-2026 ZenFlux