AI-powered, vision-driven UI automation for every platform.
npx skills add https://github.com/web-infra-dev/midscene-skills --skill chrome-bridge-automationCLI를 사용하여 이 스킬을 설치하고 작업 공간에서 SKILL.md 워크플로 사용을 시작하세요.
Vision-driven cross-platform automation
skills/browserskills/chrome-bridgeskills/computer-automationskills/android-automationskills/ios-automationskills/harmony-automationskills/vitest-midscene-e2e⚠️ AI-driven UI automation may produce unpredictable results since it can control EVERYTHING on the screen. Please evaluate the risks carefully before use.
Make sure you have Node.js installed.
Then install the skills:
# General installation
npx skills add web-infra-dev/midscene-skills
# Claude Code
npx skills add web-infra-dev/midscene-skills -a claude-code
# OpenClaw
npx skills add web-infra-dev/midscene-skills -a openclaw
Midscene requires models with strong visual grounding capabilities (accurate UI element localization from screenshots).
Because of this, you need to prepare model access and configuration separately from skill installation.
Make sure these environment variables are available in your system. You can also define them in a .env file in the current directory, and Midscene will load them automatically:
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"
Example: Gemini (Gemini-3-Flash)
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"
Example: Qwen3-VL
MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
MIDSCENE_MODEL_NAME="qwen/qwen3-vl-235b-a22b-instruct"
MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"
MIDSCENE_MODEL_FAMILY="qwen3-vl"
Example: Doubao Seed 1.6
MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-1-6-250615"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-vision"
Commonly used models: Doubao Seed 1.6, Qwen3-VL, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash.
Model setup docs:
In your chatbot or coding agent, you can say:
Use Midscene computer skill to open the Keynote app and create a new presentation.
Use Midscene browser skill to open the Google search page and search for "Midscene".
For bug reports, feature requests, and discussions, please visit the main Midscene repository: https://github.com/web-infra-dev/midscene/issues
MIT