joke-engineering

Build your agent from 200,000+ skills via skill RETRIEVAL & ORCHESTRATION

التثبيت
CLI
npx skills add https://github.com/ynulihao/AgentSkillOS --skill joke-engineering

قم بتثبيت هذه المهارة باستخدام واجهة سطر الأوامر (CLI) وابدأ في استخدام سير عمل SKILL.md في مساحة عملك.

آخر تحديث 4/29/2026

AgentSkillOS

English | 简体中文

Build your agent from 200,000+ skills via skill
RETRIEVAL & ORCHESTRATION

通过技能检索编排,从 200,000+ 技能中构建Agent

Main Page Python 3.10+ License: MIT arXiv Hugging Face Dataset

Method Benchmark Examples How to Use

News

  • [2026/03] Our new project homepage is now live!
  • [2026/03] Benchmark released — 30 multi-format creative tasks across 5 categories with pairwise Bradley-Terry evaluation.
  • [2026/03] Modular Architecture released — pluggable retrieval/orchestration modules. See ARCHITECTURE.md for details.
  • [2026/03] Batch CLI released — headless parallel execution with YAML configs, resume support, and Rich progress UI.

🌐 Overview

🔥 The agent skill ecosystem is exploding—over 200,000+skills are now publicly available.

But with so many options, how do you find the right skills for your task? And when one skill isn’t enough, how do you compose and orchestrate multiple skills into a working pipeline?

AgentSkillOS is the operating system for agent skills—helping you discover, compose, and run skill pipelines end-to-end.

Watch the video

Skill Workflow Overview

WEB UI · Visual workflow overview in the browser

CLI Workflow Run

CLI · Headless execution with terminal progress and logs

🌟 Highlights

  • 🔍 Skill Search & Discovery — Creatively discover task-relevant skills with a skill tree that organizes skills into a hierarchy based on their capabilities.
  • 🔗 Skill Orchestration — Compose and orchestrate multiple skills into a single workflow with a directed acyclic graph, automatically managing execution order, dependencies, and data flow across steps.
  • 🖥️ GUI (Human-in-the-Loop) — A built-in GUI enables human intervention at every step, making workflows controllable, auditable, and easy to steer.
  • High-Quality Skill Pool — A curated collection of high-quality skills, selected based on Claude's implementation, GitHub stars, and download volume.
  • 📊 Observability & Debugging — Trace each step with logs and metadata to debug faster and iterate on workflows with confidence.
  • 🧩 Extensible Skill Registry — Easily plug in new skills, bring your own skills via a flexible registry.
  • 📈 Benchmark — 30 multi-format creative tasks across 5 categories, evaluated with pairwise comparison and Bradley-Terry aggregation.

💡 Examples

👉 View detailed workflows on Landing Page →

📊 Check out the comparison report: AgentSkillOS vs. without skills →

Case Study

Qualitative comparison between the vanilla baseline and AgentSkillOS Quality-First outputs.

Bug Diagnosis Report
Example 01 · Bug Diagnosis Report
Mobile bug localization, fix validation, and visual bug report generation with before/after evidence.
UI Design Research
Example 02 · UI Design Research
Design-language research, report generation, and multi-direction concept mockups for knowledge software.
Paper Promotion
Example 03 · Paper Promotion
Transforms academic papers into social slides, scientific pages, and platform-specific promotion content.
Meme Video
Example 04 · Meme Video
Green-screen compositing, subtitle timing, and viral short-video production with multi-version outputs.

🏗️ Method

  • Skill tree construction: Organizes over 200,000+ skills into a capability tree, providing structured, coarse-to-fine access for efficient and creative skill discovery.
  • Skill retrieval: Automatically selects a task-relevant subset of usable skills given a user’s request.
  • Skill orchestration: Composes the selected skills into a coordinated plan (e.g., a DAG-based workflow) to solve tasks beyond the reach of any single skill. Note that we also support a freestyle mode (i.e., Claude Code).

AgentSkillOS Framework

🌲 Why Skill Tree?

Skill Retrieval Comparison

Left: Pure semantic retrieval prioritizes texutal similarity, often missing skills that look unrelated in embedding space but are crucial for actually solving the task—leading to narrow, myopic skill usage.

Right: Our LLM + Skill Tree navigates the capability hierarchy to surface non-obvious but functionally relevant skills, enabling broader, more creative, and more effective skill composition.

200 Skills 1,000 Skills 10,000 Skills

📈 Benchmark

We propose a benchmark of 30 multi-format creative tasks spanning 5 categories, evaluated via pairwise comparison with Bradley-Terry aggregation.

Three key properties:

  • Multi-format creative tasks — Tasks require end-user artifacts in formats such as PDF, PPTX, DOCX, HTML, video, and generated images.
  • Pairwise evaluation — Outputs are compared in both orders to reduce position bias and capture reliable preference signals.
  • Bradley-Terry scores — Pairwise preferences are aggregated into continuous ranking scores for fine-grained system comparisons.
Benchmark Framework Task Overview

🧪 Experiments

Evaluated across 200 / 1K / 200K skill ecosystems, AgentSkillOS demonstrates consistent superiority over baselines, with ablation confirming that both retrieval and orchestration are indispensable, and strategy selection producing structurally distinct execution graphs.

Key findings:

  • Substantial Gains over Baselines at Every Scale — All three AgentSkillOS variants achieve the highest Bradley-Terry scores across 200 / 1K / 200K ecosystems. The w/ Full Pool baseline scores poorly because a growing fraction of skills becomes invisible — structured retrieval and orchestration overcome this scalability bottleneck.
  • Ablation: Both Retrieval and Orchestration Are Essential — Removing components reveals a clear degradation gradient: without DAG orchestration, retrieval alone is insufficient; without retrieval, even oracle skills cannot close the gap. Quality-First shows only a modest deficit versus the oracle upper bound, and the gap narrows as the ecosystem grows.
  • Strategy Choice Shapes Execution Structure — Each orchestration strategy faithfully translates its design intent into a distinct DAG topology. Quality-First builds deep, multi-stage pipelines; Efficiency-First trades depth for width to maximize parallelism; Simplicity-First retains only essential steps.
Category Radar
Category Radar — Per-category Bradley-Terry performance across ecosystem scales.
Ablation Study
Ablation — Separates retrieval and orchestration effects; confirms both are required.
DAG Structure Metrics
DAG Structure Metrics — Different orchestration strategies induce distinct topology profiles.

🚀 How to Use

Installation & Configuration

Prerequisites

  • Python 3.10+
  • Claude Code (must be installed and available in PATH)
  • Use cc-switch to switch to other LLM providers

Install & Run

git clone https://github.com/ynulihao/AgentSkillOS.git
cd AgentSkillOS
pip install -e .
cp .env.example .env  # Edit with your API keys
python run.py --port 8765

Download Pre-built Trees

Tree Skills Description
🌱 skill_seeds ~50 Curated skill set (default)
📦 skill_200 200 200 skills
🗃️ skill_1000 ~1,000 1,000 skills
🏗️ skill_10000 ~10,000 10,000 active + layered dormant skills

Configuration

# .env
LLM_MODEL=openai/anthropic/claude-opus-4.5
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-key

EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=your-key

Custom Skill Groups

  1. Create data/my_skills/skill-name/SKILL.md
  2. Register in src/config.pySKILL_GROUPS
  3. Build: python run.py build -g my_skills -v
Batch Execution (Headless CLI)

Run a Batch

Run multiple tasks in parallel without the Web UI:

python run.py cli --task config/batch.yaml

See config/eval/ for ready-made batch configs covering different skill managers (tree, vector), orchestrators (dag, free-style), and skill pool sizes.

Batch Config (YAML)

batch_id: my_batch

defaults:
  skill_mode: auto          # "auto" (discover) or "specified"
  skill_group: skill_200    # Which skill pool to use
  output_dir: ./runs
  continue_on_error: true

execution:
  parallel: 2               # Max concurrent tasks
  retry_failed: 0

tasks:
  - file: path/to/task1.json
  - file: path/to/task2.json
  - dir: path/to/tasks/     # Scan directory
    pattern: "*.json"

CLI Flags

Flag Description
--task PATH, -T Path to batch YAML config (required)
--parallel N, -p Override parallel task count
--resume PATH, -R Resume an interrupted batch run
--output-dir PATH, -o Override output directory
--dry-run Preview tasks without execution
--verbose, -v Show detailed logs
--manager PLUGIN, -m Override skill manager (e.g., tree, vector)
--orchestrator PLUGIN Override orchestrator (e.g., dag, free-style)

Resume Interrupted Runs

python run.py cli -T config/batch.yaml --resume ./runs/my_batch_20260306_120000

Completed tasks are skipped; only remaining tasks are re-executed.

Output Structure

./runs/{batch_id}/
├── batch_result.json          # Batch summary (metrics, costs, eval scores)
└── {task_id}__{run_id}/       # Per-task directory
    ├── meta.json
    ├── result.json
    ├── evaluation.json
    └── artifacts/             # Task outputs (PDF, HTML, video, etc.)

🔮 Future Work

  • [x] Recipe Generation & Storage
  • [ ] Interactive Agent Execution
  • [ ] Plan Refinement
  • [ ] Auto Skill Import
  • [ ] Dependency Detection
  • [ ] History Management
  • [ ] Multi-CLI Support (Codex, Gemini CLI, Cursor)

Citation

If you find AgentSKillOS useful, consider citing our paper:

@article{li2026organizing,
  title={Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale},
  author={Li, Hao and Mu, Chunjiang and Chen, Jianhao and Ren, Siyue and Cui, Zhiyao and Zhang, Yiqun and Bai, Lei and Hu, Shuyue},
  journal={arXiv preprint arXiv:2603.02176},
  year={2026}
}