Public repository for Datadog Agent Skills
npx skills add https://github.com/datadog-labs/agent-skills --skill dd-apmقم بتثبيت هذه المهارة باستخدام واجهة سطر الأوامر (CLI) وابدأ في استخدام سير عمل SKILL.md في مساحة عملك.
Datadog skills for Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, OpenCode, and other AI agents.
| Skill | Description |
|---|---|
| dd-pup | Primary CLI - commands, auth, PATH setup |
| dd-monitors | Create, manage, mute monitors |
| dd-logs | Search logs |
| dd-apm | Traces, services, performance, Single-Step Instrumentation |
| dd-docs | Search Datadog documentation |
| dd-llmo | LLM Observability: experiments, eval RCA, evaluator generation, session classification |
| dd-browser-sdk | Browser SDK: RUM, Logs, Session Replay, profiling, product analytics, error tracking, version migration |
| dd-audit | Audit Trail investigations: who changed what, key compromise, cost spike root cause, compliance evidence (SOC 2/PCI), AI activity auditing |
# Homebrew (macOS/Linux) — recommended
brew tap datadog-labs/pack
brew install datadog-labs/pack/pup
# Or build from source
git clone https://github.com/datadog-labs/pup.git && cd pup
cargo build --release
cp target/release/pup ~/.local/bin
Pre-built binaries are also available from the latest release.
# Authenticate
pup auth login
For JUST dd-pup:
npx skills add datadog-labs/agent-skills \
--skill dd-pup \
--full-depth -y
npx skills add datadog-labs/agent-skills \
--skill dd-pup \
--skill dd-monitors \
--skill dd-logs \
--skill dd-apm \
--skill dd-docs \
--full-depth -y
The dd-llmo directory contains four skills for working with LLM Observability data:
| Skill | Purpose |
|---|---|
experiment-analyzer |
Analyze and compare offline LLM experiments |
eval-trace-rca |
Root-cause production failures using eval judge signal or runtime errors |
eval-bootstrap |
Generate evaluator code from traces, optionally seeded by RCA output |
eval-session-classify |
Classify whether user intent was satisfied in a session (trace + RUM signals) |
Eval pipeline flow:
eval-session-classify eval-trace-rca → eval-bootstrap
(classify sessions) (diagnose why) (build evals)
Run eval-trace-rca to understand why an app is failing by analyzing eval judge verdicts or
runtime errors across production traces. Then run eval-bootstrap to generate evaluator code
that captures those failure patterns. Pass the RCA output directly to eval-bootstrap to seed
it with the discovered failure taxonomy.
Use eval-session-classify independently to evaluate whether individual assistant sessions
satisfied user intent, combining LLM Obs trace data with RUM behavioral signals.
# Claude Code — copy any or all skills
cp -r dd-llmo/experiment-analyzer ~/.claude/skills
cp -r dd-llmo/eval-trace-rca ~/.claude/skills
cp -r dd-llmo/eval-bootstrap ~/.claude/skills
cp -r dd-llmo/eval-session-classify ~/.claude/skills
All four skills require the LLMO toolset:
claude mcp add --scope user --transport http "datadog-llmo-mcp" 'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs'
experiment-analyzer uses the core toolset for notebook export (optional). eval-session-classify
requires it for RUM behavioral analysis and efficient batched fetches of trace session spans:
claude mcp add --scope user --transport http "datadog-mcp-core" 'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=core'
# Analyze experiments
experiment-analyzer <experiment_id> # single experiment
experiment-analyzer <baseline_id> <candidate_id> # compare two experiments
experiment-analyzer <id(s)> <question> # ask a specific question
experiment-analyzer <id(s)> [question] --output notebook # export to Datadog notebook
# Root-cause why an app is failing
What's wrong with <ml_app> based on its evals over the last 24h
Analyze eval failures for <eval_name> over the last week
Look at the errors on <ml_app> over the last 24h
# Generate evaluator code from production traces
/eval-bootstrap <ml_app> # cold start
/eval-bootstrap <ml_app> [paste eval-trace-rca output here] # seeded from RCA
/eval-bootstrap <ml_app> --data-only # emit JSON spec instead of Python SDK code
# Classify a session
/eval-session-classify <session_id>
The dd-audit directory contains five skills for investigating Datadog Audit Trail data:
| Skill | Purpose |
|---|---|
security-investigation |
Who changed what, user activity, login geo, deletions, permission changes |
key-compromise |
Investigate a potentially compromised API key — timeline, geo/IP, endpoints called |
cost-spike-investigation |
Correlate usage spike (Usage Metering) with config changes (Audit Trail) to find root cause |
compliance-report |
Generate SOC 2 / PCI DSS evidence from audit data |
ai-activity-audit |
Audit what the Bits AI / MCP assistant did in your org |
These skills use the Datadog Audit REST API directly (no pup audit command exists yet). You need an API key + App key with audit_logs_read scope:
export DD_API_KEY=<your-api-key>
export DD_APP_KEY=<your-app-key>
export DD_SITE=datadoghq.com # or us3/us5/eu/ap1/ap2
# Claude Code — copy any or all skills
cp -r dd-audit/security-investigation ~/.claude/skills
cp -r dd-audit/key-compromise ~/.claude/skills
cp -r dd-audit/cost-spike-investigation ~/.claude/skills
cp -r dd-audit/compliance-report ~/.claude/skills
cp -r dd-audit/ai-activity-audit ~/.claude/skills
# Security investigation
Who deleted monitors in the last 24 hours?
What did [email protected] do this week?
Show login activity from unexpected locations
# Key compromise
Was API key <key_id> used from unexpected locations?
Investigate this API key: <key_id>
# Cost spike
Why did our LLM Observability usage spike on May 1?
What caused the cost increase this week?
# Compliance
Generate SOC 2 evidence for CC6.2 and CC6.3 for Q1 2026
Create a PCI DSS Requirement 10 report for the last 90 days
# AI activity
What did the Bits AI assistant do in my org this week?
Show me a governance report for AI tool calls in April
| Task | Command |
|---|---|
| Search error logs | pup logs search --query "status:error" --from 1h |
| List monitors | pup monitors list |
| Schedule monitor downtime | pup downtime create --file downtime.json |
| Find slow traces | pup traces search --query "service:api @duration:>500ms" --from 1h |
| Query metrics | pup metrics query --query "avg:system.cpu.user{*}" |
| List services for an env (required) | pup apm services list --env <env> --from 1h --to now |
| Check auth | pup auth status |
| Refresh token | pup auth refresh |
More commands for pup are found in the official pup docs.
# Check auth first (includes token time remaining)
pup auth status
# If commands fail with 401/403, try refresh first
pup auth refresh
# If refresh fails or no session exists, do full OAuth login
pup auth login
# Non-default site/org
pup auth login --site datadoghq.eu --org <org>
If the browser opens the wrong profile/window, use the one-time URL printed by pup auth login and open it manually in the correct session.
Additional skills available soon.
# List all available
npx skills add datadog-labs/agent-skills --list --full-depth
MIT