We are dedicated to building a set of open agent skills that deliver superior performance, higher determinism, and greater consistency on targeted tasks, while operating at a lower cost and with reduced context usage.
npx skills add https://github.com/MassLab-SII/open-agent-skills --skill shopping-admin-browser-automationInstallieren Sie diesen Skill ΓΌber die CLI und beginnen Sie mit der Verwendung des SKILL.md-Workflows in Ihrem Arbeitsbereich.
Enhancing AI Agent Efficiency Through Domain-Specific Skills

"Model Context Protocol (MCP) connects Claude to third-party tools, and skills teach Claude how to use them well."
β Extending Claude's capabilities with skills and MCP servers, Anthropic
As AI agents evolve, there is a growing need for modular, reusable approaches to equip them with domain-specific expertise while mitigating issues like excessive MCP context consumption. To address this, Anthropic introduced Agent Skills as an open standard on December 18, 2025, allowing agents to dynamically load structured instructions and resources for more effective task execution. Although platforms such as OpenAI Codex have adopted this standard, native support remains limited to specific ecosystems.
However, many LLM providers have not yet adopted the Agent Skills standard, leaving this efficient approach temporarily inaccessible to a broader audience. We, the Project Q team at SII, bridge this gap by providing a lightweight, efficient open-source framework fully compatible with the Agent Skills standard, extending these capabilities to any LLM provider. Our implementation focuses on the synergy between Model Context Protocol (MCP) and Skills: MCP provides access to external tools and systems, while Skills provide the procedural knowledge to utilize tools (including MCP) effectively. With our skill-based implementation, we achieved up to ~20x context reduction compared to pure MCP approaches.
This project builds upon MCPMark, a comprehensive evaluation suite for assessing the agentic capabilities of frontier models. We extend MCPMark's benchmark capabilities by introducing a skill-based Implementation.
# Clone the repository
git clone https://github.com/zjtco-yr/open-agent-skills.git
cd open-agent-skills
# install with uv (faster)
uv pip install -e .
Create a .mcp_env file with your API credentials:
# Example: OpenAI
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_API_KEY="sk-..."
# Optional: Notion (only for Notion tasks)
# SOURCE_NOTION_API_KEY="your-source-notion-api-key"
# EVAL_NOTION_API_KEY="your-eval-notion-api-key"
# EVAL_PARENT_PAGE_TITLE="MCPMark Eval Hub"
# Optional: Playwright (only for Playwright tasks)
PLAYWRIGHT_BROWSER="chromium" # chromium | firefox
PLAYWRIGHT_HEADLESS="True"
# Optional: GitHub (only for GitHub tasks)
GITHUB_TOKENS="token1,token2" # token pooling for rate limits
GITHUB_EVAL_ORG="your-eval-org"
For more detailed environment configuration, service setup, and authentication instructions, please refer to MCPMark.
We conducted preliminary benchmark evaluations comparing MCP with Open-Agent-Skills (Skill with MCP) using the Claude-Sonnet-4.5 model. Results demonstrate that Skill with MCP achieves significant advantages in both task performance and token efficiency.
| Benchmark | MCP | Skills |
|---|---|---|
| GitHub | 29.35% | 43.48% |
| Filesystem | 32.5% | 53.3% |
| Playwright WebArena | 32.14% | 52.38% |
| Benchmark | MCP Tokens | Skills Tokens | Reduction |
|---|---|---|---|
| GitHub | 7.36M | 4.47M | 39.25% |
| Filesystem | 2.27M | 1.55M | 31.55% |
| Playwright WebArena | 10.25M | 8.27M | 19.30% |
| Task | MCP Tokens | Skills Tokens | Reduction |
|---|---|---|---|
english_talent (Filesystem) |
1.05M | 0.053M | 95.00% (~20x) |
find_commit_date (GitHub) |
1.21M | 0.23M | 80.84% |
marketing_customer_analysis(Playwright_webarena) |
1.45M | 0.67M | 54.00% |
Task: This task involves multiple operations including account registration, creating posts, and setting up forums. For the complete task description, see tasks/playwright_webarena/standard/reddit/budget_europe_travel.
| Sign Up - Skill with MCP | Sign Up - MCP |
|
|
| Create Forum - Skill with MCP | Create Forum - MCP |
|
|
| Create Page - Skill with MCP | Create Page - MCP |
|
|
| Create Submission - Skill with MCP | Create Submission - MCP |
|
|
| Full Demo - Skill with MCP | Full Demo - MCP |
Result: Using 4 skills, achieved 2x context reduction.
# Start the browser server
uv run skills/scripts/reddit/browser_server.py &
# Execute a reddit task with skills
python -m pipeline \
--mcp playwright_webarena \
--models claude-sonnet-4.5 \
--tasks reddit/budget_europe_travel \
--exp-name skill-demo \
--k 1
We acknowledge that the current implementation has room for improvement. This project is intended as a starting point to demonstrate the potential of combining MCP with domain-specific skills.
Balancing Specificity and Generalization: While some of our skills are designed with broad applicability (e.g., Playwright-based skills for registration, posting, forum creation), others remain highly task-specific (e.g., time-based file classification in filesystem skills). We will focus on improving skill design patterns to strike a better balance between task-specific performance and cross-domain generalization.
Skill-MCP Integration: We plan to:
Extended Domain Coverage:
Security is a primary consideration in our skill execution framework. It inherits and leverages several security features from MCPMark, including:
However, we strongly recommend that you always review skill scripts before execution in production, whether creating new skills or using existing ones.
After configuring the relevant APIs and Docker environment (if needed), you can run the following commands:
python -m pipeline --mcp playwright_webarena \
--models claude-sonnet-4.5 \
--tasks reddit/marketing_customer_analysis \
--exp-name webarena-test \
--k 1
python -m pipeline --mcp filesystem \
--models claude-sonnet-4.5 \
--tasks student_database/english_talent \
--exp-name filesystem-test \
--k 1
python -m pipeline --mcp github \
--models claude-sonnet-4.5 \
--tasks build_your_own_x/find_commit_date \
--exp-name github-test \
--k 1
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Open-Agent-Skills - Making AI agents more effective through domain expertise skill