Skills for AI agents
npx skills add https://github.com/pskoett/pskoett-ai-skills --skill self-improvement-ciInstala esta habilidad con la CLI y comienza a usar el flujo de trabajo SKILL.md en tu espacio de trabajo.
A collection of skills for AI agents. Follows the Agent Skills specification.
This repository is my personal skill testing ground.
Every skill in this collection is built around a philosophy — a principle that addresses a specific failure mode in how agents work today. plan-interview is about collaborative planning: before codebase exploration starts, user and agent run a structured interview to align on constraints, scope, risk, and success criteria — and to surface whether a preparatory refactor should come before the main change. intent-framed-agent makes execution intent explicit so scope drift becomes visible. context-surfing monitors context quality and exits cleanly before degradation corrupts output. verify-gate runs compile, test, and lint checks so the agent doesn't need you to tell it the output was wrong if a test can. simplify-and-harden uses the peak context at end-of-task for a focused quality and security review. self-improvement turns repeated mistakes into durable rules that persist across sessions.
The common thread: agents have peak context at specific moments — after planning, mid-execution, at completion, after learning — and these skills are designed to exploit those peaks. Each skill encodes a philosophy that agents struggle to internalize on their own, turning it into a structured workflow they can follow reliably.
If you want to improve agent output over time, you need two loops, not one. The inner loop catches failures during a running session: the agent detects a problem, verifies its work against machine signals, and recovers without you touching anything. The outer loop closes gaps across sessions: you capture where the agent failed, figure out what knowledge was missing, and encode it somewhere the agent can reach next time. learning-aggregator reads accumulated learnings across sessions and surfaces patterns. harness-updater encodes those patterns as permanent rules in project instruction files. eval-creator turns promoted rules into regression tests. pre-flight-check surfaces all of this at the start of the next session — closing the loop. The knowledge gaps get smaller with every cycle as it compounds.
skill-pipeline ties these pieces together by classifying the task and routing it through the right combination at the right depth.
Install as a Claude Code plugin from this repo's marketplace. Run each command from inside Claude Code:
/plugin marketplace add pskoett/pskoett-skills
/plugin install pskoett-ai-skills@pskoett-skills
/reload-plugins
This installs the full bundle: skills, audit agents, and hooks.
The same bundle now ships as a repo-local Codex plugin from plugin/.
pskoett skills marketplace, and install pskoett-ai-skills.Codex reads the marketplace from .agents/plugins/marketplace.json and the plugin manifest from plugin/.codex-plugin/plugin.json.
The same bundle ships as a Copilot CLI plugin nested under plugin/.copilot-plugin/, reusing the shared plugin/skills/, plugin/agents/, and plugin/hooks/ content:
copilot plugin marketplace add pskoett/pskoett-skills
copilot plugin install pskoett-ai-skills
Copilot reads the marketplace from .github/plugin/marketplace.json and the plugin manifest from plugin/.copilot-plugin/plugin.json.
gh skill)GitHub CLI now supports Agent Skills via gh skill. Requires GitHub CLI v2.90.0 or later.
# Browse this repo's skills interactively
gh skill install pskoett/pskoett-skills
# Install specific skills directly
gh skill install pskoett/pskoett-skills verify-gate
gh skill install pskoett/pskoett-skills simplify-and-harden
gh skill install pskoett/pskoett-skills self-improvement
# Target a specific host and scope when needed
gh skill install pskoett/pskoett-skills verify-gate --agent codex --scope user
gh skill installs to the correct skill directory for the selected host, including GitHub Copilot, Claude Code, Codex, Cursor, and Gemini CLI.
If you only want specific skills and not the full plugin bundle:
npx skills add pskoett/pskoett-skills/skills/verify-gate
npx skills add pskoett/pskoett-skills/skills/simplify-and-harden
npx skills add pskoett/pskoett-skills/skills/self-improvement
Works with any agent following the Agent Skills specification.
Clone and copy (or symlink) the skill directories you want:
git clone https://github.com/pskoett/pskoett-skills.git
cp -r pskoett-skills/skills/verify-gate ~/.claude/skills/
skills/
skill-name/
SKILL.md # Required - skill definition with YAML frontmatter
scripts/ # Optional - executable code
references/ # Optional - documentation loaded on demand
assets/ # Optional - templates, images, data files
| Skill | Description |
|---|---|
| agent-teams-simplify-and-harden | Implementation + audit loop using parallel agent teams with structured simplify, harden, and document passes |
| context-surfing | Monitors context window health and rides peak context quality for maximum output fidelity during multi-step execution |
| dx-data-navigator | Query DX Data Cloud for developer productivity metrics, DORA metrics, PR/deployment data, and engineering analytics |
| intent-framed-agent | Captures a lightweight intent contract at execution start and monitors coding-task drift until resolution |
| plan-interview | Runs a structured interview before planning non-trivial implementations |
| self-improvement | Captures learnings and errors with hook-based activation and automatic skill extraction |
| skill-pipeline | Pipeline orchestrator that classifies tasks and routes them through the right skill combination at the right depth |
| simplify-and-harden | Post-completion self-review that runs simplify, harden, and micro-documentation passes before signaling done |
| verify-gate | Machine verification gate (compile, test, lint) between implementation and quality review with fix loop |
| learning-aggregator | Cross-session analysis of .learnings/ files — finds patterns, ranks promotion candidates |
| pre-flight-check | Session-start scan that surfaces relevant learnings, errors, and eval status before work begins |
| eval-creator | Creates permanent eval cases from promoted learnings and runs regression checks |
Headless CI variants for GitHub Agentic Workflows. Each mirrors an interactive skill but runs without human interaction — scanning, reporting, and optionally gating PRs.
| Skill | Description |
|---|---|
| self-improvement-ci | CI-only self-improvement workflow for recurring failure-pattern capture using gh-aw |
| simplify-and-harden-ci | CI-only simplify/harden workflow for pull requests using gh-aw with headless scan/report gates |
| learning-aggregator-ci | CI-only cross-session learning aggregation — scheduled pattern detection and gap reporting using gh-aw |
| eval-creator-ci | CI-only eval regression runner — per-PR eval checks and scheduled eval creation from promoted patterns using gh-aw |
The skills implement two feedback loops that improve agent output over time.
Inner loop (within a session): detect → verify → recover
Outer loop (across sessions): inspect → encode → regress-test
Each skill prevents a distinct failure mode:
| Skill | Loop | Failure it prevents |
|---|---|---|
plan-interview |
— | Building the wrong thing |
intent-framed-agent |
Inner (detect) | Scope creep during execution |
context-surfing |
Inner (detect + recover) | Degraded-context corruption |
verify-gate |
Inner (verify + recover) | Shipping code that doesn't compile or pass tests |
simplify-and-harden |
Inner (detect) | Shipping rough/insecure code |
self-improvement |
Bridge (capture) | Repeating the same mistakes |
pre-flight-check |
Bridge (surface) | Starting work blind to known patterns |
learning-aggregator |
Outer (inspect) | Accumulated learnings nobody reads |
harness-updater |
Outer (encode) | Patterns that never become rules |
eval-creator |
Outer (regress-test) | Fixed issues that silently regress |
[plan-interview] → [intent-framed-agent] ⟂ [context-surfing] → [verify-gate] → [simplify-and-harden] → [self-improvement]
↑ concurrent ↑ ↻ fix loop
Stage 1 — Planning (manual gate): plan-interview runs a structured interview and produces a plan file in docs/plans/. This is the only skill that requires explicit invocation (/plan-interview). Downstream skills activate automatically when present, but each works independently if earlier stages are skipped.
Stage 2 — Execution (concurrent monitoring): intent-framed-agent captures the intent frame and monitors scope drift. context-surfing monitors context quality drift. Both run simultaneously. If both fire at once, context-surfing's exit takes precedence — degraded context makes scope checks unreliable.
Stage 3 — Verification (machine gate): verify-gate runs the project's compile, test, and lint commands. If any fail, it enters a fix loop (up to 3 attempts per phase). Only when all checks pass does work proceed to the quality review.
Stage 4 — Review (post-completion): simplify-and-harden runs three passes (simplify, harden, document) on the completed work.
Stage 5 — Learning (automatic): self-improvement captures recurring patterns from the session to .learnings/.
.learnings/ → [learning-aggregator] → [harness-updater] → [eval-creator]
↓
[pre-flight-check] → next session
Inspect: learning-aggregator reads all .learnings/ files, groups by pattern, and ranks promotion candidates.
Encode: harness-updater agent takes promotion candidates and applies them as rules in CLAUDE.md, AGENTS.md, and copilot-instructions.md.
Regress-test: eval-creator turns promoted patterns into permanent test cases in .evals/ and runs regression checks.
Bridge: pre-flight-check surfaces accumulated learnings and eval status at session start, feeding outer loop improvements back into the inner loop.
| Stage | Artifact | Location |
|---|---|---|
| Planning | Plan file | docs/plans/plan-NNN-<slug>.md |
| Execution | Intent frame | Emitted in session output |
| Execution | Handoff file (on drift exit) | .context-surfing/handoff-<slug>-<timestamp>.md |
| Verification | Pass/fail signal | Emitted in session output |
| Review | Structured YAML summary | Appended to task output |
| Learning | Learning entries | .learnings/LEARNINGS.md, ERRORS.md, FEATURE_REQUESTS.md |
| Aggregation | Gap report | Emitted by learning-aggregator |
| Encoding | Updated rules | CLAUDE.md, AGENTS.md |
| Regression | Eval cases + results | .evals/EVAL_INDEX.md, .evals/cases/ |
Every skill works standalone. The pipeline is the recommended combination, not a hard dependency — each skill silently adapts when upstream artifacts are absent.
Match depth to complexity:
| Task | Skills |
|---|---|
| Trivial (typo fix, rename) | None |
| Small (isolated bug fix) | verify-gate + simplify-and-harden |
| Medium (feature, multi-file) | intent-framed-agent + verify-gate + simplify-and-harden |
| Large (refactor, new architecture) | Full inner loop pipeline |
| Long-running (multi-session) | Full inner loop — context-surfing is critical |
| Periodic (weekly, sprint boundary) | Outer loop: learning-aggregator → harness-updater → eval-creator |
To use a skill, add it to your agent's configuration or reference it directly.
Skills with hooks register them via SKILL.md frontmatter when installed as a plugin. For standalone use, add to .claude/settings.json:
{
"hooks": {
"UserPromptSubmit": [{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "./skills/self-improvement/scripts/activator.sh"
}
]
}],
"SessionStart": [{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "./skills/context-surfing/scripts/handoff-checker.sh"
},
{
"type": "command",
"command": "./skills/pre-flight-check/scripts/pre-flight.sh"
}
]
}],
"PostToolUse": [{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "./skills/self-improvement/scripts/error-detector.sh"
}
]
}]
}
}
| Hook | Script | Skill | Purpose |
|---|---|---|---|
| UserPromptSubmit | activator.sh |
self-improvement | Reminds to evaluate learnings after tasks |
| SessionStart | handoff-checker.sh |
context-surfing | Detects unread handoff files from previous context exits |
| SessionStart | pre-flight.sh |
pre-flight-check | Surfaces accumulated learnings, errors, and eval status |
| PostToolUse (Bash) | error-detector.sh |
self-improvement | Detects command failures for automatic error logging |
All hooks are lightweight (~50-200 tokens) and output nothing when no signals exist.
Feel free to submit PRs with new skills or improvements to existing ones.