shopping-admin-browser-automation

We are dedicated to building a set of open agent skills that deliver superior performance, higher determinism, and greater consistency on targeted tasks, while operating at a lower cost and with reduced context usage.

설치
CLI
npx skills add https://github.com/MassLab-SII/open-agent-skills --skill shopping-admin-browser-automation

CLI를 사용하여 이 스킬을 설치하고 작업 공간에서 SKILL.md 워크플로 사용을 시작하세요.

최근 업데이트: 5/1/2026
Open Agent Skills

Enhancing AI Agent Efficiency Through Domain-Specific Skills

Skill with MCP


"Model Context Protocol (MCP) connects Claude to third-party tools, and skills teach Claude how to use them well."
Extending Claude's capabilities with skills and MCP servers, Anthropic

📖 Introduction

As AI agents evolve, there is a growing need for modular, reusable approaches to equip them with domain-specific expertise while mitigating issues like excessive MCP context consumption. To address this, Anthropic introduced Agent Skills as an open standard on December 18, 2025, allowing agents to dynamically load structured instructions and resources for more effective task execution. Although platforms such as OpenAI Codex have adopted this standard, native support remains limited to specific ecosystems.

However, many LLM providers have not yet adopted the Agent Skills standard, leaving this efficient approach temporarily inaccessible to a broader audience. We, the Project Q team at SII, bridge this gap by providing a lightweight, efficient open-source framework fully compatible with the Agent Skills standard, extending these capabilities to any LLM provider. Our implementation focuses on the synergy between Model Context Protocol (MCP) and Skills: MCP provides access to external tools and systems, while Skills provide the procedural knowledge to utilize tools (including MCP) effectively. With our skill-based implementation, we achieved up to ~20x context reduction compared to pure MCP approaches.

This project builds upon MCPMark, a comprehensive evaluation suite for assessing the agentic capabilities of frontier models. We extend MCPMark's benchmark capabilities by introducing a skill-based Implementation.

Key Features

  • Lightweight Skill Implementation: Built with a minimal framework approach, making skills implementation easy to read and quick to understand
  • 🔗 Skill + MCP Integration: See how skills leverage MCP's external knowledge capabilities and organize multiple MCP services to collaboratively complete tasks
  • 🔌 Claude Agent Standard Compatible: Skills can be directly executed in Claude and other compatible agent environments
  • 🔀 LLM provider Agnostic: Not limited to Claude or Codex, any LLM can leverage skills to enhance efficiency

🛠️ Installation

Prerequisites

  • Python 3.11+
  • uv package manager (recommended, for faster installs)

Quick Install

# Clone the repository
git clone https://github.com/zjtco-yr/open-agent-skills.git
cd open-agent-skills

# install with uv (faster)
uv pip install -e .

Environment Configuration

Create a .mcp_env file with your API credentials:

# Example: OpenAI
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_API_KEY="sk-..."

# Optional: Notion (only for Notion tasks)
# SOURCE_NOTION_API_KEY="your-source-notion-api-key"
# EVAL_NOTION_API_KEY="your-eval-notion-api-key"
# EVAL_PARENT_PAGE_TITLE="MCPMark Eval Hub"

# Optional: Playwright (only for Playwright tasks)
PLAYWRIGHT_BROWSER="chromium"   # chromium | firefox
PLAYWRIGHT_HEADLESS="True"

# Optional: GitHub (only for GitHub tasks)
GITHUB_TOKENS="token1,token2"   # token pooling for rate limits
GITHUB_EVAL_ORG="your-eval-org"

For more detailed environment configuration, service setup, and authentication instructions, please refer to MCPMark.


📊 Performance Results

We conducted preliminary benchmark evaluations comparing MCP with Open-Agent-Skills (Skill with MCP) using the Claude-Sonnet-4.5 model. Results demonstrate that Skill with MCP achieves significant advantages in both task performance and token efficiency.

Overall Pass@1 Accuracy

Benchmark MCP Skills
GitHub 29.35% 43.48%
Filesystem 32.5% 53.3%
Playwright WebArena 32.14% 52.38%

Token Efficiency (Successful Tasks Only)

Benchmark MCP Tokens Skills Tokens Reduction
GitHub 7.36M 4.47M 39.25%
Filesystem 2.27M 1.55M 31.55%
Playwright WebArena 10.25M 8.27M 19.30%

Extreme Cases

Task MCP Tokens Skills Tokens Reduction
english_talent (Filesystem) 1.05M 0.053M 95.00% (~20x)
find_commit_date (GitHub) 1.21M 0.23M 80.84%
marketing_customer_analysis
(Playwright_webarena)
1.45M 0.67M 54.00%

Key Insights

  1. Significant Accuracy Improvement: Average Pass@1 accuracy increased by ~18 percentage points across all three benchmarks
  2. Excellent Token Efficiency: Token consumption for successful tasks reduced by 20-40%, with extreme cases reaching 80-95%

🎬 Demo: How Skills Work

Example: budget_europe_travel in reddit

Task: This task involves multiple operations including account registration, creating posts, and setting up forums. For the complete task description, see tasks/playwright_webarena/standard/reddit/budget_europe_travel.

Sign Up - Skill with MCP Sign Up - MCP
Create Forum - Skill with MCP Create Forum - MCP
Create Page - Skill with MCP Create Page - MCP
Create Submission - Skill with MCP Create Submission - MCP
Full Demo - Skill with MCP Full Demo - MCP

Result: Using 4 skills, achieved 2x context reduction.

Running the Demo

# Start the browser server
uv run skills/scripts/reddit/browser_server.py &

# Execute a reddit task with skills
python -m pipeline \
  --mcp playwright_webarena \
  --models claude-sonnet-4.5 \
  --tasks reddit/budget_europe_travel \
  --exp-name skill-demo \
  --k 1

🚀 Future Plans

We acknowledge that the current implementation has room for improvement. This project is intended as a starting point to demonstrate the potential of combining MCP with domain-specific skills.

Planned Improvements

  1. Balancing Specificity and Generalization: While some of our skills are designed with broad applicability (e.g., Playwright-based skills for registration, posting, forum creation), others remain highly task-specific (e.g., time-based file classification in filesystem skills). We will focus on improving skill design patterns to strike a better balance between task-specific performance and cross-domain generalization.

  2. Skill-MCP Integration: We plan to:

    • Develop better coordination patterns between skills and MCP tools
    • Enable smoother handoffs between skill scripts and MCP operations
  3. Extended Domain Coverage:

    • Advanced database operations and Notion workspace automation
    • Common daily life use cases (productivity, personal automation, etc.)

🔒 Security

Security is a primary consideration in our skill execution framework. It inherits and leverages several security features from MCPMark, including:

  • Directory Isolation: MCP/Skill file operations are confined to specific directories
  • Docker Containerization: Tasks run in isolated containers

However, we strongly recommend that you always review skill scripts before execution in production, whether creating new skills or using existing ones.


📚 References & Resources

Open Standards

Documentation


▶️ Quick Start

After configuring the relevant APIs and Docker environment (if needed), you can run the following commands:

Run a Playwright WebArena Task

python -m pipeline --mcp playwright_webarena \
  --models claude-sonnet-4.5 \
  --tasks reddit/marketing_customer_analysis \
  --exp-name webarena-test \
  --k 1

Run a Filesystem Task

python -m pipeline --mcp filesystem \
  --models claude-sonnet-4.5 \
  --tasks student_database/english_talent \
  --exp-name filesystem-test \
  --k 1

Run a Github Task

python -m pipeline --mcp github \
  --models claude-sonnet-4.5 \
  --tasks build_your_own_x/find_commit_date \
  --exp-name github-test \
  --k 1

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Open-Agent-Skills - Making AI agents more effective through domain expertise skill

Report Bug · Request Feature