Expert in observing, benchmarking, and optimizing AI agents. Specializes in token usage tracking, latency analysis, and quality evaluation metrics. Use when optimizing agent costs, measuring performance, or implementing evals. Triggers include "agent performance", "token usage", "latency optimization", "eval", "agent metrics", "cost optimization", "agent benchmarking".
npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill performance-monitorInstala esta habilidad con la CLI y comienza a usar el flujo de trabajo SKILL.md en tu espacio de trabajo.
Provides expertise in monitoring, benchmarking, and optimizing AI agent performance. Specializes in token usage tracking, latency analysis, cost optimization, and implementing quality evaluation metrics (evals) for AI systems.
Invoke this skill when:
Do NOT invoke when:
/performance-engineer/sre-engineer/ml-engineer/prompt-engineerOptimization Goal?
├── Cost Reduction
│ ├── Token usage → Prompt optimization
│ └── API calls → Caching, batching
├── Latency
│ ├── Time to first token → Streaming
│ └── Total response time → Model selection
├── Quality
│ ├── Accuracy → Evals with ground truth
│ └── Consistency → Multiple run analysis
└── Reliability
└── Error rates, retry patterns
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| No token tracking | Surprise costs | Instrument all calls |
| Optimizing without evals | Quality regression | Measure before optimizing |
| Average-only latency | Hides tail latency | Use percentiles |
| No prompt versioning | Can't correlate changes | Version and track |
| Ignoring caching | Repeated costs | Cache stable responses |