Academic Research Skills for Claude Code: research → write → review → revise → finalize
npx skills add https://github.com/imbad0202/academic-research-skills --skill academic-paper-reviewerقم بتثبيت هذه المهارة باستخدام واجهة سطر الأوامر (CLI) وابدأ في استخدام سير عمل SKILL.md في مساحة عملك.
A comprehensive suite of Claude Code skills for academic research, covering the full pipeline from research to publication.
AI is your copilot, not the pilot. This tool won't write your paper for you. It handles the grunt work — hunting down references, formatting citations, verifying data, checking logical consistency — so you can focus on the parts that actually require your brain: defining the question, choosing the method, interpreting what the data means, and writing the sentence after "I argue that."
Unlike a humanizer, this tool doesn't help you hide the fact that you used AI. It helps you write better. Style Calibration learns your voice from past work. Writing Quality Check catches the patterns that make prose feel machine-generated. The goal is quality, not cheating.
Lu et al. (2026, Nature 651:914-919) built The AI Scientist — the first fully autonomous AI research system to publish a paper through blind peer review at a top-tier ML venue (ICLR 2025 workshop, score 6.33/10 vs workshop average 4.87). Their Limitations section enumerates the failure modes that any fully-autonomous AI research pipeline inherits: implementation bugs, hallucinated results, shortcut reliance, bug-as-insight reframing, methodology fabrication, frame-lock, citation hallucinations.
ARS is built on the premise that a human researcher augmented by AI avoids these failure modes better than either alone. Stage 2.5 and Stage 4.5 integrity gates run a 7-mode blocking checklist (see academic-pipeline/references/ai_research_failure_modes.md); the reviewer offers an opt-in calibration mode that measures its own FNR/FPR against a user-supplied gold set.
v3.3 was inspired by PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google): Semantic Scholar API verification, anti-leakage protocol, VLM figure verification, and score trajectory tracking.
👉 docs/ARCHITECTURE.md — the full pipeline view: flow diagram, stage-by-stage matrix, data-access flow, skill dependency graph, quality gates, and mode list.
The architecture doc supersedes the sprawling pipeline description that used to live here. Everything about what runs in which stage now lives in one place.
👉 docs/SETUP.md — install Claude Code, set up API keys, optional Pandoc/tectonic for DOCX/PDF, cross-model verification (ARS_CROSS_MODEL), and four installation methods including claude.ai Project import.
👉 docs/PERFORMANCE.md — per-mode token budgets, full-pipeline estimate (~$4–6 for a 15k-word paper), and recommended Claude Code settings (Skip Permissions; Agent Team optional).
repro_lock, optional cross-model integrity verification, mid-conversation reinforcement, and score trajectory tracking.data_access_level (raw / redacted / verified_only); enforced by scripts/check_data_access_level.py. Pattern adapted from Anthropic's automated-w2s-researcher (2026). See shared/ground_truth_isolation_pattern.md.task_type (open-ended or outcome-gradable). All current ARS skills are open-ended.shared/benchmark_report_pattern.md.repro_lock sub-block on Material Passport. Configuration documentation, not replay guarantee — LLM outputs are not byte-reproducible. See shared/artifact_reproducibility_pattern.md.See the complete artifacts from a real 10-stage pipeline run — peer review reports, integrity verification reports, and the final paper:
Browse all pipeline artifacts →
| Artifact | Description |
|---|---|
| Final Paper (EN) | APA 7.0 formatted, LaTeX-compiled |
| Final Paper (ZH) | Chinese version, APA 7.0 |
| Integrity Report — Pre-Review | Stage 2.5: caught 15 fabricated refs + 3 statistical errors |
| Integrity Report — Final | Stage 4.5: zero regressions confirmed |
| Peer Review Round 1 | EIC + 3 Reviewers + Devil's Advocate |
| Re-Review | Verification after revisions |
| Peer Review Round 2 | Follow-up review |
| Response to Reviewers | Point-by-point author response |
| Post-Publication Audit Report | Independent full-reference audit: found 21/68 issues missed by 3 rounds of integrity checks |
If your research involves running experiments (code or human studies) before writing, the Experiment Agent skill fills the gap between ARS Stage 1 (RESEARCH) and Stage 2 (WRITE).
ARS Stage 1 RESEARCH → RQ Brief + Methodology Blueprint
↓
experiment-agent → run/manage experiments → validate results
↓
ARS Stage 2 WRITE → write paper with verified experiment results
What it does: executes code experiments (Python, R, etc.) with real-time monitoring, manages human study protocols with IRB ethics checklist, interprets statistics with 11-type fallacy detection, and verifies reproducibility.
How to use together: pause the ARS pipeline after Stage 1, run experiments in a separate experiment-agent session, then bring the results (with Material Passport) back to ARS Stage 2. ARS requires zero modification. See the experiment-agent README for setup instructions.
# Start a full research pipeline
You: "I want to write a research paper on AI's impact on higher education QA"
# Start with Socratic guidance
You: "Guide my research on AI in educational evaluation"
# Write a paper with guided planning
You: "Guide me through writing a paper on demographic decline"
# Review an existing paper
You: "Review this paper" (then provide the paper)
# Check pipeline status
You: "status"
"Research the impact of AI on higher education" → full mode
"Give me a quick brief on X" → quick mode
"Do a systematic review on X with PRISMA" → systematic-review mode
"Guide my research on X" → socratic mode (guided)
"Fact-check these claims" → fact-check mode
"Do a literature review on X" → lit-review mode
"Review this paper's research quality" → review mode
"Write a paper on X" → full mode
"Guide me through writing a paper" → plan mode (guided)
"Build a paper outline" → outline-only mode
"I have a draft, here are reviewer comments" → revision mode
"Parse these reviewer comments into a roadmap" → revision-coach mode
"Write an abstract for this paper" → abstract-only mode
"Turn this into a literature review paper" → lit-review mode
"Convert to LaTeX" / "Convert citations to IEEE" → format-convert mode
"Check citations" → citation-check mode
"Generate an AI disclosure statement for NeurIPS" → disclosure mode
"Review this paper" → full mode (EIC + R1/R2/R3 + Devil's Advocate)
"Quick assessment of this paper" → quick mode
"Guide me to improve this paper" → guided mode
"Check the methodology" → methodology-focus mode
"Verify the revisions" → re-review mode
"Calibrate this reviewer against my gold set" → calibration mode
"I want to write a complete research paper" → full pipeline from Stage 1
"I already have a paper, review it" → mid-entry at Stage 2.5 (integrity first)
"I received reviewer comments" → mid-entry at Stage 4
Pipeline ends with Stage 6: Process Summary — auto-generates a paper creation process record with 6-dimension Collaboration Quality Evaluation (1–100 scoring).
Using a different language? Socratic mode (deep-research) and Plan mode (academic-paper) use intent-based activation — they detect the meaning of your request, not specific keywords. This means they work in any language without modification.
However, the general
Trigger Keywordssection (which determines whether the skill is activated at all) still lists English and Traditional Chinese keywords. If you find the skill isn't activating reliably in your language, you can add your language's keywords to the### Trigger Keywordssection in eachSKILL.mdfile to improve matching confidence.
Per-agent responsibilities and per-stage artifacts now live in docs/ARCHITECTURE.md. Version numbers are anchored here so release metadata stays in one place.
13-agent research team. Modes: full, quick, review, lit-review, fact-check, socratic, systematic-review. Full agent roster and artifacts: see ARCHITECTURE.md §3.
12-agent paper writing pipeline. Modes: full, plan, outline-only, revision, revision-coach, abstract-only, lit-review, format-convert, citation-check, disclosure. Output: MD + DOCX (via Pandoc when available) + LaTeX (APA 7.0 apa7 class / IEEE / Chicago) → PDF via tectonic. Full agent roster and per-phase responsibilities: see ARCHITECTURE.md §3.
7-agent multi-perspective review with 0-100 quality rubrics. Modes: full, re-review, quick, methodology-focus, guided, calibration. Decision mapping: ≥80 Accept, 65-79 Minor Revision, 50-64 Major Revision, <50 Reject. First-round review team vs. narrow re-review team boundary: see ARCHITECTURE.md §3 Stage 3 / Stage 3'.
10-stage orchestrator with integrity verification, two-stage review, Socratic coaching, and collaboration evaluation. Pipeline guarantees: every stage requires user confirmation checkpoint; integrity verification (Stage 2.5 + 4.5) cannot be skipped; R&R Traceability Matrix (Schema 11) independently verifies author revision claims. v3.4 added the Compliance Agent (PRISMA-trAIce + RAISE) at Stage 2.5 / 4.5. v3.5 adds the Collaboration Depth Observer (collaboration_depth_agent, advisory only — never blocks) at every FULL/SLIM checkpoint and at pipeline completion. MANDATORY integrity gates (2.5 / 4.5) explicitly skip the observer so compliance checks are not diluted. Based on Wang & Zhang (2026), IJETHE 23:11. Stage-by-stage matrix with agents, artifacts, and gates: see ARCHITECTURE.md §3.
While using ARS to write a reflection article about AI in higher education, I ran into three structural problems that no amount of prompt engineering could fix:
Frame-lock: I asked the AI to run a devil's advocate debate against its own thesis. It did — four rounds, each more refined than the last. But every round stayed inside the frame I'd set. The DA attacked arguments, never premises. It never asked "are we even discussing the right question?" This is the same pattern that caused the 31% citation error rate in v2.7's stress test: the verifying AI and the generating AI share the same cognitive frame.
Sycophancy under pushback: Every time I challenged the DA's attacks, it conceded too quickly. It retracted findings faster than it launched them. The model's training rewards conversational harmony — so "the user pushed back" was treated as evidence that the attack was wrong, when often it just meant the user was persistent.
Intent misdetection: The Socratic Mentor kept trying to converge and produce deliverables ("Want me to write this up?") when I was still exploring. It couldn't distinguish "the user wants a deep philosophical discussion" from "the user wants an RQ brief." Both look like engagement, but they need opposite AI behaviors.
Devil's Advocate — Concession Threshold Protocol (deep-research + academic-paper-reviewer)
Socratic Mentor — Intent Detection Layer (deep-research)
Socratic Mentor — Dialogue Health Indicator (deep-research)
These optimizations don't solve AI's structural limits — they make the limits visible and manageable. The DA will still eventually concede if pushed hard enough. The Socratic Mentor will still have some convergence bias. But now there are explicit checkpoints that slow down the sycophancy, force the DA to justify concessions, and prevent the Mentor from wrapping up before the user is ready.
The deeper lesson: AI literacy isn't about learning to use AI as a tool, following ethics rules, or fearing AI risks. It's about engaging AI deeply enough to discover its structural limits yourself — and your own thinking limits in the process.
This work is licensed under CC-BY-NC 4.0.
You are free to:
Under the following terms:
Attribution format:
Based on Academic Research Skills by Cheng-I Wu
https://github.com/Imbad0202/academic-research-skills
Cheng-I Wu (吳政宜) — Author and maintainer
aspi6246 — Contributor. The v3.1 optimization was inspired by patterns from Claude-Code-Skills-for-Academics: read-only constraint pattern, anti-pattern codification as first-class design, cognitive framework approach (teaching "how to think" not just procedures), and lean skill size philosophy.
mchesbro1 — Contributor. Originally proposed and drafted the IS Basket of 8 journals for academic-paper-reviewer/references/top_journals_by_field.md (Issue #5).
cloudenochcsis — Contributor. Extended the IS section from the Basket of 8 to the full Senior Scholars' Basket of 11 — adding Decision Support Systems, Information & Management, and Information and Organization (Issue #7, PR #8). Sourced from the AIS Senior Scholars' List of Premier Journals.
v3.5.1 adds an opt-in honesty probe to the Socratic Mentor (ARS_SOCRATIC_READING_PROBE=1). Default off. See CHANGELOG.
ARS_SOCRATIC_READING_PROBE=1 is set, the Socratic Mentor fires a one-time honesty probe during goal-oriented sessions where the user has cited a specific paper. Decline is logged without penalty. Outcome flows into the Research Plan Summary and Stage 6 AI Self-Reflection Report. No new agent, no schema change.deep-research SKILL version: 2.9.0 → 2.9.1. academic-pipeline SKILL version: 3.5.0 → 3.5.1. Suite version bumped to 3.5.1.collaboration_depth_agent in academic-pipeline (Agent Team grows from 3 to 4). Invoked at every FULL/SLIM checkpoint and at pipeline completion; scores user-AI collaboration against a 4-dimension rubric. Advisory only — never blocks progression. MANDATORY checkpoints (Stages 2.5 / 4.5 integrity gates) do NOT invoke the observer.shared/collaboration_depth_rubric.md v1.0. Dimensions: Delegation Intensity, Cognitive Vigilance, Cognitive Reallocation, Zone Classification (Zone 1 / Zone 2 / Zone 3). Based on Wang, S., & Zhang, H. (2026). "Pedagogical partnerships with generative AI in higher education: how dual cognitive pathways paradoxically enable transformative learning." International Journal of Educational Technology in Higher Education, 23:11. DOI 10.1186/s41239-026-00585-x.ARS_CROSS_MODEL is set the observer runs on both models; dimension disagreement > 2 points is reported rather than silently smoothed. ARS_CROSS_MODEL_SAMPLE_INTERVAL escape hatch for cost trade-off.insufficient_evidence block instead of dispatching the full-model observer.academic-pipeline SKILL version: 3.3.0 → 3.4.0. Suite version bumped to 3.5.0. New lint scripts/check_collaboration_depth_rubric.py + 10 tests.compliance_history[] (append-only).disclosure_addendum into manuscript. No detection evasion possible.task_type: open-ended.docs/ARCHITECTURE.md as the single source of truth for pipeline structure (flow, matrix, data-access, dependency graph, quality gates, modes). Merged into main via PR #18.docs/SETUP.md (prerequisites, API keys, Pandoc/tectonic, cross-model verification, installation methods) and docs/PERFORMANCE.md (token budgets, recommended Claude Code settings). README links to both instead of inlining them.3.3.6.benchmark_report.schema.json + repro_lock optional block on Material Passport. Both ship with pattern docs, lints, and examples. First formal Python dev dep manifest (requirements-dev.txt).README.md and README.zh-TW.md so they include the missing v3.3.3 and v3.3.2 release summaries.scripts/check_spec_consistency.py so future README changelog drift fails CI.--- fences now fail cleanly instead of being parsed as valid YAML..docx generation is Pandoc-dependent, with Markdown + conversion instructions as fallback.v3.3.3 release: suite version bump, academic-paper -> v3.0.2, academic-pipeline -> v3.2.2.metadata.data_access_level to all top-level SKILL.md files with enforced vocabulary: raw, redacted, verified_only.metadata.task_type to all top-level SKILL.md files with enforced vocabulary: open-ended, outcome-gradable.shared/ground_truth_isolation_pattern.md and linked the new vocabulary from shared/handoff_schemas.md..claude/CLAUDE.md, MODE_REGISTRY.md, and SKILL.md files to the current mode counts and published skill versions.Integrates techniques from PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google).
[MATERIAL GAP] for missing content instead of filling from memory. Reduces Mode 5/6 failure risk.Integrates insights from Lu et al. (2026, Nature 651:914-919) — the first end-to-end autonomous AI research system to pass blind peer review.
External contributions: @mchesbro1 originally proposed and drafted the IS Basket of 8 journals (Issue #5); @cloudenochcsis extended it to the full Senior Scholars' Basket of 11 (Issue #7, PR #8). Updated academic-paper-reviewer/references/top_journals_by_field.md Section 7, adding Decision Support Systems, Information & Management, and Information and Organization. Source: AIS Senior Scholars' List of Premier Journals.
Inspired by patterns from aspi6246/Claude-Code-Skills-for-Academics.
Wave 1: Anti-Context-Rot Anchors
Wave 2: Traceability + Cognitive Frameworks + Reinforcement
argumentation_reasoning_framework.md — Toulmin model, Bradford Hill causal reasoning, inference to best explanation, epistemic status classificationreview_quality_thinking.md — three lenses (internal validity, external validity, contribution), common reviewer traps, calibration questionswriting_judgment_framework.md — clarity test, reader's journey, discipline-specific voice, revision decision matrixWave 3: Lean Skill Size
references/ filesARS_CROSS_MODEL env var — without it, everything works as before. See shared/cross_model_verification.md for full setup guide, API patterns, and cost estimates.shared/style_calibration_protocol.mdacademic-paper/references/writing_quality_check.md): Writing quality checklist applied during draft self-review. 5 categories: AI high-frequency term warnings (25 terms), punctuation pattern control (em dash ≤3), throat-clearing opener detection, structural pattern warnings (Rule of Three, uniform paragraphs, synonym cycling), and burstiness checks (sentence length variation). These are good writing rules — not detection evasionshared/handoff_schemas.md)deep-research/references/socratic_questioning_framework.md: SCR Overlay Protocol mapping SCR phases to Socratic functionsCHANGELOG.mdsocratic/plan over full — safer to guide first.apa7 document class, text justification fix (ragged2e + etoolbox), table column width formula, bilingual abstract centering, standardized font stack (Times New Roman + Source Han Serif TC VF + Courier New), PDF via tectonic onlytectonic (no HTML-to-PDF); APA 7.0 uses apa7 document class (man mode) with XeCJK for bilingual CJK support; font stack: Times New Roman + Source Han Serif TC VF + Courier Newintegrity_verification_agent — 100% reference/data verification with audit traildevils_advocate_reviewer_agent — 8-dimension thesis challenger