Build your agent from 200,000+ skills via skill RETRIEVAL & ORCHESTRATION
npx skills add https://github.com/ynulihao/AgentSkillOS --skill joke-engineeringCLI를 사용하여 이 스킬을 설치하고 작업 공간에서 SKILL.md 워크플로 사용을 시작하세요.
English | 简体中文
News
- [2026/03] Our new project homepage is now live!
- [2026/03] Benchmark released — 30 multi-format creative tasks across 5 categories with pairwise Bradley-Terry evaluation.
- [2026/03] Modular Architecture released — pluggable retrieval/orchestration modules. See ARCHITECTURE.md for details.
- [2026/03] Batch CLI released — headless parallel execution with YAML configs, resume support, and Rich progress UI.
🔥 The agent skill ecosystem is exploding—over 200,000+skills are now publicly available.
But with so many options, how do you find the right skills for your task? And when one skill isn’t enough, how do you compose and orchestrate multiple skills into a working pipeline?
AgentSkillOS is the operating system for agent skills—helping you discover, compose, and run skill pipelines end-to-end.
WEB UI · Visual workflow overview in the browser
CLI · Headless execution with terminal progress and logs
👉 View detailed workflows on Landing Page →
📊 Check out the comparison report: AgentSkillOS vs. without skills →

Qualitative comparison between the vanilla baseline and AgentSkillOS Quality-First outputs.


Left: Pure semantic retrieval prioritizes texutal similarity, often missing skills that look unrelated in embedding space but are crucial for actually solving the task—leading to narrow, myopic skill usage.
Right: Our LLM + Skill Tree navigates the capability hierarchy to surface non-obvious but functionally relevant skills, enabling broader, more creative, and more effective skill composition.
| 200 Skills | 1,000 Skills | 10,000 Skills |
![]() |
![]() |
![]() |
We propose a benchmark of 30 multi-format creative tasks spanning 5 categories, evaluated via pairwise comparison with Bradley-Terry aggregation.
Three key properties:
|
|
Evaluated across 200 / 1K / 200K skill ecosystems, AgentSkillOS demonstrates consistent superiority over baselines, with ablation confirming that both retrieval and orchestration are indispensable, and strategy selection producing structurally distinct execution graphs.
Key findings:
Category Radar — Per-category Bradley-Terry performance across ecosystem scales. |
|
Ablation — Separates retrieval and orchestration effects; confirms both are required. |
DAG Structure Metrics — Different orchestration strategies induce distinct topology profiles. |
git clone https://github.com/ynulihao/AgentSkillOS.git
cd AgentSkillOS
pip install -e .
cp .env.example .env # Edit with your API keys
python run.py --port 8765
| Tree | Skills | Description |
|---|---|---|
🌱 skill_seeds |
~50 | Curated skill set (default) |
📦 skill_200 |
200 | 200 skills |
🗃️ skill_1000 |
~1,000 | 1,000 skills |
🏗️ skill_10000 |
~10,000 | 10,000 active + layered dormant skills |
# .env
LLM_MODEL=openai/anthropic/claude-opus-4.5
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-key
EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=your-key
data/my_skills/skill-name/SKILL.mdsrc/config.py → SKILL_GROUPSpython run.py build -g my_skills -vRun multiple tasks in parallel without the Web UI:
python run.py cli --task config/batch.yaml
See config/eval/ for ready-made batch configs covering different skill managers (tree, vector), orchestrators (dag, free-style), and skill pool sizes.
batch_id: my_batch
defaults:
skill_mode: auto # "auto" (discover) or "specified"
skill_group: skill_200 # Which skill pool to use
output_dir: ./runs
continue_on_error: true
execution:
parallel: 2 # Max concurrent tasks
retry_failed: 0
tasks:
- file: path/to/task1.json
- file: path/to/task2.json
- dir: path/to/tasks/ # Scan directory
pattern: "*.json"
| Flag | Description |
|---|---|
--task PATH, -T |
Path to batch YAML config (required) |
--parallel N, -p |
Override parallel task count |
--resume PATH, -R |
Resume an interrupted batch run |
--output-dir PATH, -o |
Override output directory |
--dry-run |
Preview tasks without execution |
--verbose, -v |
Show detailed logs |
--manager PLUGIN, -m |
Override skill manager (e.g., tree, vector) |
--orchestrator PLUGIN |
Override orchestrator (e.g., dag, free-style) |
python run.py cli -T config/batch.yaml --resume ./runs/my_batch_20260306_120000
Completed tasks are skipped; only remaining tasks are re-executed.
./runs/{batch_id}/
├── batch_result.json # Batch summary (metrics, costs, eval scores)
└── {task_id}__{run_id}/ # Per-task directory
├── meta.json
├── result.json
├── evaluation.json
└── artifacts/ # Task outputs (PDF, HTML, video, etc.)
If you find AgentSKillOS useful, consider citing our paper:
@article{li2026organizing,
title={Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale},
author={Li, Hao and Mu, Chunjiang and Chen, Jianhao and Ren, Siyue and Cui, Zhiyao and Zhang, Yiqun and Bai, Lei and Hu, Shuyue},
journal={arXiv preprint arXiv:2603.02176},
year={2026}
}