npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill paper-context-resolver使用 CLI 安装这个技能,并在你的工作区中直接复用对应的 SKILL.md 工作流。
Brand note: the repository brand is now ai-research-workflow-skills. If the GitHub repo slug has not been migrated yet, keep using lllllllama/ai-paper-reproduction-skills for clone and npx skills add commands until the slug migration is complete.
Migration note:
ai-paper-reproduction -> ai-research-reproductionresearch-explore -> ai-research-exploreLane-aware skill repository for deep learning research workflows.
🔒 Trusted for reproduction, setup, analysis, training verification, and debugging.
🧪 Explore only when the researcher explicitly authorizes candidate-only work.
🔗 Share the sameSKILL.mdcontracts across Agent Skills, Codex, and Claude Code.
This repository is built around one default rule: trusted by default.
This repository currently ships:
11 skills total: 9 public skills and 2 helper skills.6 trusted-lane public skills and 3 explore-lane public skills.4 project-scoped Claude Code command wrappers under .claude/commands/.42 Python test scripts, including 15 focused research-explore regressions.The skills use the open SKILL.md layout, so the same repository can be installed into neutral Agent Skills directories as well as Codex and Claude Code. For shared local installs, prefer ~/.agents/skills/ or ./.agents/skills/. Client-specific installs under ~/.codex/skills/ and ~/.claude/skills/ remain supported.
This repository is intended to be usable on both Windows and Linux.
python ..., npx ..., and relative paths.$HOME/.agents/skills, $HOME/.codex/skills, and $HOME/.claude/skills. These work well in Linux shells and in PowerShell, and Python accepts forward slashes on Windows paths../.agents/skills and ./tmp/codex-skills are also valid on both platforms.For most users, start with npx. It is the shortest path and should be enough for normal use.
npxInstall the full repository skill set:
npx skills add lllllllama/ai-paper-reproduction-skills --all
Install only the trusted main entrypoint:
npx skills add lllllllama/ai-paper-reproduction-skills --skill ai-research-reproduction
Install only the exploratory main entrypoint:
npx skills add lllllllama/ai-paper-reproduction-skills --skill ai-research-explore
If you only want to get started quickly, stop here.
Claude Code can auto-invoke these skills when the descriptions match, or you can call them directly with commands such as /ai-research-reproduction, /ai-research-explore, and /safe-debug.
Project-scoped Claude Code slash commands currently ship for:
/ai-research-reproduction/ai-research-explore/analyze-project/safe-debugUse the Python installer only if you are developing locally, need a project-scoped install, or want to target neutral Agent Skills, Codex, or Claude Code directories manually.
Install from a local clone into a neutral Agent Skills directory:
python scripts/install_skills.py --client agents --target "$HOME/.agents/skills" --force
Install into a project-scoped neutral Agent Skills directory:
python scripts/install_skills.py --client agents --target ./.agents/skills --force
Install with the default neutral target:
python scripts/install_skills.py --force
Install the full repository skill set in Codex:
npx skills add lllllllama/ai-paper-reproduction-skills --all
Install only the trusted reproduction orchestrator in Codex:
npx skills add lllllllama/ai-paper-reproduction-skills --skill ai-research-reproduction
Install from a local clone into Codex:
python scripts/install_skills.py --client codex --target "$HOME/.codex/skills" --force
Install from a local clone into Claude Code:
python scripts/install_skills.py --client claude --target "$HOME/.claude/skills" --force
Install into a project-scoped Claude Code skills directory:
python scripts/install_skills.py --client claude --target ./.claude/skills --force
PowerShell note:
$HOME/.codex/skills with something like $env:USERPROFILE\\.codex\\skills.| If you want to... | Use |
|---|---|
| Reproduce a repository end-to-end from the README | ai-research-reproduction |
Run a third-scenario campaign on top of current_research with frozen task, eval, and SOTA inputs |
ai-research-explore |
| Analyze the repository without editing or running heavy jobs | analyze-project |
| Prepare environment, dataset, checkpoint, and cache assumptions | env-and-assets-bootstrap |
| Run a documented inference or evaluation command conservatively | minimal-run-and-audit |
| Start or resume documented training conservatively | run-train |
| Diagnose a traceback or failed training or inference run safely | safe-debug |
| Make isolated exploratory code changes only | explore-code |
| Run isolated exploratory trials only | explore-run |
Bundled helper skills:
repo-intake-and-planpaper-context-resolverUse the trusted lane for reproduction, setup, analysis, bounded execution, training verification, and debugging.
ai-research-reproductionrepro_outputs/, train_outputs/, analysis_outputs/, debug_outputs/Use the explore lane only when the researcher explicitly authorizes candidate-only exploratory work.
ai-research-exploreexplore-code, explore-runexplore_outputs/current_researchcurrent_research should be a durable reference such as a branch, commit, checkpoint, run record, or already-trained local model state. It does not imply a trusted baseline; it is the context the exploration branches from.
Helpers are intentionally narrow and should usually be orchestrator-invoked rather than used as the first entry point.
SKILL.md is the canonical cross-client contract in this repository.
SKILL.md, repository-local scripts/, and references/agents/openai.yaml.claude/commands/*.mdSee references/client-compatibility-policy.md.
flowchart TD
A[User request] --> B{Explicit candidate-only exploration?}
B -- No --> C[Trusted lane]
B -- Yes --> D[Explore lane]
C --> C1[ai-research-reproduction]
C --> C2[analyze-project]
C --> C3[env-and-assets-bootstrap]
C --> C4[minimal-run-and-audit]
C --> C5[run-train]
C --> C6[safe-debug]
D --> D1[ai-research-explore]
D --> D2[explore-code]
D --> D3[explore-run]
C1 -. helper .-> H1[repo-intake-and-plan]
C1 -. helper .-> H2[paper-context-resolver]
ai-research-explore is optimized for the third scenario: the researcher has already frozen the task family, dataset, evaluation method, and provided SOTA references, and wants governed exploration on top of current_research.
flowchart LR
A[current_research + research_campaign] --> B[analysis_outputs and sources]
B --> C[IDEA_SEEDS.json<br/>bounded seed expansion]
C --> D[IDEA_SCORES.json<br/>IDEA_EVALUATION.md]
D --> E[ATOMIC_IDEA_MAP.md and .json]
E --> F[IMPLEMENTATION_FIDELITY.md and .json]
F --> G{Checkpoint and manifest clear?}
G -- No --> H[Stop with candidate-only blockers]
G -- Yes --> I[bounded short-cycle runs]
I --> J[explore_outputs<br/>candidate-only summary]
Current implementation highlights:
analysis_outputs/IDEA_SEEDS.json.analysis_outputs/IDEA_SCORES.json.analysis_outputs/ATOMIC_IDEA_MAP.md and analysis_outputs/ATOMIC_IDEA_MAP.json.analysis_outputs/IMPLEMENTATION_FIDELITY.md and analysis_outputs/IMPLEMENTATION_FIDELITY.json.changed_files, new_files, deleted_files, and touched_paths rather than planned target placeholders.The explore lane must not claim trusted reproduction success, global benchmark completeness, or verified novelty.
| Lane | Skill | Purpose |
|---|---|---|
| Trusted | ai-research-reproduction |
End-to-end README-first reproduction orchestrator |
| Trusted | env-and-assets-bootstrap |
Conservative environment, dataset, checkpoint, and cache planning |
| Trusted | minimal-run-and-audit |
Trusted inference, evaluation, smoke, and sanity execution |
| Trusted | analyze-project |
Read-only project analysis, model mapping, and risk surfacing |
| Trusted | run-train |
Training startup verification, resume handling, bounded monitoring, and training records |
| Trusted | safe-debug |
Research-safe debugging: analyze first, patch only after approval |
| Explore | ai-research-explore |
Third-scenario exploratory orchestration on top of current_research with repo understanding, idea gating, and governed experiments |
| Explore | explore-code |
Exploratory code adaptation, transplant, and stitching on isolated branches |
| Explore | explore-run |
Small-subset probes, short-cycle trials, and ranked exploratory runs |
| Helper | repo-intake-and-plan |
Narrow helper for repo scanning and README command extraction |
| Helper | paper-context-resolver |
Narrow helper for README-paper gap resolution |
This repository does not publish a single line-coverage percentage in the README. Instead, it documents the regression surface that is currently covered by repository tests.
| Coverage area | Current scope | Representative checks |
|---|---|---|
| Registry, installation, and wrappers | File-level integrity, install targets, Claude wrappers, README routing | test_skill_registry.py, test_install_targets.py, test_claude_command_wrappers.py, test_readme_selection.py |
| Trusted lane rendering and routing | Reproduction, training, analysis, debug, lane routing | test_output_rendering.py, test_train_output_rendering.py, test_analysis_output_rendering.py, test_safe_debug_output_rendering.py, test_training_lane_routing.py |
| Explore lane orchestration | Dry run, campaign flow, checkpoint, abandon path, artifact consistency, execution feasibility | test_research_explore_dry_run.py, test_research_explore_campaign_flow.py, test_research_explore_campaign_checkpoint.py, test_research_explore_campaign_abandon.py, test_research_explore_artifact_consistency.py |
| Explore idea and implementation contracts | Idea seeds, atomic decomposition, implementation fidelity, contract shape | test_idea_seed_generation.py, test_atomic_idea_decomposition.py, test_implementation_fidelity.py, test_research_explore_contracts.py |
| Explore execution evidence | Training and non-training executor evidence propagation | test_research_explore_variant_execution.py, test_research_explore_nontraining_execution.py |
| Research lookup | Provider resolution, cache, inventory rendering, repo extractors, evidence layering | test_research_lookup_arxiv_provider.py, test_research_lookup_repo_extractor.py, test_research_lookup_inventory_rendering.py, test_research_lookup_evidence_layers.py |
Coverage notes:
scripts/validate_repo.py is still the fast file-level validator.ubuntu-latest, macos-latest, and windows-latest.| Directory | Purpose |
|---|---|
repro_outputs/ |
Trusted reproduction bundle |
train_outputs/ |
Trusted training execution bundle |
analysis_outputs/ |
Read-only project analysis plus research map, change map, eval contract, source inventory/support, improvement bank, idea cards, idea seeds, atomic idea map, implementation fidelity, mapping, and resource plan |
debug_outputs/ |
Safe debug diagnosis and patch plan |
sources/ |
Free-first research lookup records with sources/records/, stable names, bounded provider resolution, repo-local extraction, and an auditable index |
explore_outputs/ |
Exploratory changeset, idea gate, experiment plan, experiment manifest, split static/runtime smoke reporting, ledger, and ranked run summary |
ai-research-explore still accepts a plain variant_spec.json, but the preferred input for the third scenario is research_campaign.json or research_campaign.yaml.
The campaign should freeze:
task_familydatasetbenchmarkevaluation_sourcesota_referencecompute_budgetvariant_speccandidate_ideas is preferred but optional. ai-research-explore preserves researcher ideas and may also add a small number of bounded synthesized or hybrid seed ideas for search-space expansion. Generated seeds stay bound to current_research, task_family, dataset, and the frozen evaluation_source.
Optional campaign blocks:
research_lookupidea_policyidea_generationsource_constraintsfeasibility_policySee skills/ai-research-explore/references/research-campaign-spec.md.
Trusted reproduction
Use ai-research-reproduction on this AI repo. Stay README-first, prefer documented inference or evaluation, avoid unnecessary repo changes, and write outputs to repro_outputs/.
Current-research exploration
Use ai-research-explore on top of current_research improved-model@branch. Work on an isolated branch, coordinate code and run exploration together, try several variants, and rank candidates in explore_outputs/.
Third-scenario campaign exploration
Use ai-research-explore with research_campaign.json. Treat the provided task family, dataset, evaluation source, and SOTA table as frozen inputs, rank the candidate ideas, keep each candidate single-variable, and write governed outputs to analysis_outputs/ and explore_outputs/.
Read-only analysis
Use analyze-project on this repo. Read the code, map the model and training entrypoints, and flag suspicious patterns without editing files.
Trusted training
Use run-train on this repo. Run the selected documented training command conservatively for startup verification and write train_outputs/.
Safe debug
Use safe-debug on this traceback. Diagnose the failure first, propose the smallest safe fix, and do not patch until I approve.
Exploratory code only
Use explore-code on an isolated branch. Try a LoRA adaptation for this backbone, keep it exploratory only, and summarize the changes in explore_outputs/.
Exploratory runs only
Use explore-run on an experiment branch. Do a small-subset short-cycle sweep, rank the top runs, and treat the results as candidates only.
Run the repository checks:
python scripts/validate_repo.py
python scripts/test_skill_registry.py
python scripts/test_trigger_boundaries.py
python scripts/test_claude_command_wrappers.py
python scripts/test_readme_selection.py
Run output and orchestration regressions:
python scripts/test_output_rendering.py
python scripts/test_train_output_rendering.py
python scripts/test_analysis_output_rendering.py
python scripts/test_safe_debug_output_rendering.py
python scripts/test_explore_output_rendering.py
python scripts/test_explore_variant_matrix.py
python scripts/test_atomic_idea_decomposition.py
python scripts/test_idea_seed_generation.py
python scripts/test_implementation_fidelity.py
python scripts/test_research_explore_contracts.py
python scripts/test_research_explore_dry_run.py
python scripts/test_research_explore_campaign_flow.py
python scripts/test_research_explore_campaign_abandon.py
python scripts/test_research_explore_campaign_checkpoint.py
python scripts/test_research_explore_artifact_consistency.py
python scripts/test_research_explore_variant_execution.py
python scripts/test_research_explore_nontraining_execution.py
python scripts/test_orchestrator_dry_run.py
python scripts/test_training_lane_routing.py
Run research-lookup regressions:
python scripts/test_research_lookup_arxiv_provider.py
python scripts/test_research_lookup_doi_provider.py
python scripts/test_research_lookup_github_provider.py
python scripts/test_research_lookup_url_provider.py
python scripts/test_research_lookup_repo_extractor.py
python scripts/test_research_lookup_cache.py
python scripts/test_research_lookup_inventory_rendering.py
python scripts/test_research_lookup_evidence_layers.py
Run setup and installer regressions:
python scripts/test_bootstrap_env.py
python scripts/test_install_targets.py
python scripts/test_setup_planning.py
python scripts/install_skills.py --client agents --target ./tmp/agents-skills --force
python scripts/install_skills.py --client codex --target ./tmp/codex-skills --force
python scripts/install_skills.py --client claude --target ./tmp/claude-skills --force
run-train is a bounded training monitor, not a full long-running scheduler.ai-research-explore is a governed third-scenario orchestrator, not an open-ended autonomous research agent.This is a lane-aware deep learning research skill repository optimized for safety, observability, reuse, and auditable workflow boundaries.