npx skills add https://github.com/patrickporto/desktop-agent --skill desktop-control使用 CLI 安装这个技能,并在你的工作区中直接复用对应的 SKILL.md 工作流。
🤖 AI Agent Skill for desktop automation using PyAutoGUI.
Control mouse, keyboard, and screen programmatically through a simple CLI interface.
Add this skill to your AI coding agent with a single command:
npx skills add patrickporto/desktop-agent
Install the CLI with pipx (recommended):
pipx install desktop-agent
Or run without installing using uvx:
uvx desktop-agent --help
Or using pip:
pip install desktop-agent
This project is packaged as an AI Agent Skill. To use it:
pip install desktop-agent or pipx install desktop-agentdesktop-agent <category> <command>Quick Reference for Agents:
desktop-agent --helppipx install desktop-agent
pip install desktop-agent
uvx desktop-agent
The CLI is organized into command categories:
mouse)# Move mouse to coordinates
desktop-agent mouse move 100 200
# Move with duration (animation)
desktop-agent mouse move 100 200 --duration 1.0
# Click at current position
desktop-agent mouse click
# Click at specific coordinates
desktop-agent mouse click 500 500
# Right click
desktop-agent mouse right-click
# Double click
desktop-agent mouse double-click 300 400
# Drag to coordinates
desktop-agent mouse drag 200 300
# Scroll (positive = up, negative = down)
desktop-agent mouse scroll 5
desktop-agent mouse scroll -3
# Get current mouse position
desktop-agent mouse position
keyboard)# Write text
desktop-agent keyboard write "Hello World"
# Write with interval between keys
desktop-agent keyboard write "Slow typing" --interval 0.1
# Press a key
desktop-agent keyboard press enter
# Press multiple times
desktop-agent keyboard press a --presses 5
# Execute keyboard shortcut
desktop-agent keyboard hotkey "ctrl,c"
desktop-agent keyboard hotkey "ctrl,shift,esc"
# Hold/release key
desktop-agent keyboard keydown shift
desktop-agent keyboard keyup shift
screen)# Capture screenshot (full screen)
desktop-agent screen screenshot my_screen.png
# Take screenshot of active window
desktop-agent screen screenshot active_window.png --active
# Take screenshot of specific window
desktop-agent screen screenshot notepad.png --window "Notepad"
# Screenshot of specific region (x,y,width,height)
desktop-agent screen screenshot region.png --region "100,100,500,400"
# Locate image within active window
desktop-agent screen locate button.png --active
# Locate center of image on screen
desktop-agent screen locate-center button.png --confidence 0.8
# Find text coordinates within active window
desktop-agent screen locate-text-coordinates "OK" --active
# Find text in specific image
desktop-agent screen locate-text-coordinates "Confirm" --image screenshot.png
# Case-sensitive search
desktop-agent screen locate-text-coordinates "Login" --case-sensitive
# Read all text from screen
desktop-agent screen read-all-text
# Read text from image
desktop-agent screen read-all-text --image capture.png
# Specify languages for OCR (default: pt,en)
desktop-agent screen locate-text-coordinates "Button" --lang "en"
message)# Show alert
desktop-agent message alert "Hello!"
# Confirmation
desktop-agent message confirm "Are you sure?"
# Input prompt
desktop-agent message prompt "Enter your name:"
# Password
desktop-agent message password "Enter your password:"
app)# Open an application (cross-platform)
desktop-agent app open notepad
desktop-agent app open "Google Chrome"
# Open with arguments
desktop-agent app open chrome --arg "https://google.com"
# Focus on a window by title
desktop-agent app focus "Untitled - Notepad"
# List all visible windows
desktop-agent app list
desktop-agent app open notepad
desktop-agent app focus notepad
desktop-agent keyboard write "Hello from Desktop Skill!"
desktop-agent screen screenshot full_screen.png
desktop-agent screen pixel 500 500
Run desktop-agent --help to see all commands:
desktop-agent --help
desktop-agent mouse --help
desktop-agent keyboard --help
desktop-agent screen --help
desktop-agent message --help
desktop-skill/
├── desktop_agent/ # Main package
│ ├── __init__.py
│ ├── commands/ # Command modules
│ │ ├── __init__.py
│ │ ├── mouse.py # Mouse commands
│ │ ├── keyboard.py # Keyboard commands
│ │ ├── screen.py # Screen/screenshot/OCR commands
│ │ └── message.py # Message boxes
├── pyproject.toml # Project configuration
└── README.md # This documentation