wikipedia-research

Installation
CLI
npx skills add https://github.com/joshuaroll/wikipedia-research-skill --skill wikipedia-research

Installez cette compétence avec la CLI et commencez à utiliser le flux de travail SKILL.md dans votre espace de travail.

Dernière mise à jour le 6/24/2026

Wikipedia Research Skill

A comprehensive Claude Code skill for extracting verifiable, citation-backed research from Wikipedia with full provenance tracking for AI verification pipelines.

Why Use This Skill?

Without Skill With Skill
Unstructured prose Structured JSON with schema
"Various sources" Citations with DOIs, PMIDs
Claims float freely Every claim mapped to citations
No verification possible DOI/PMID validation included
Unknown reliability Admiralty Code quality rating
No relationships Entity + relationship extraction
No timeline Chronological event mapping

Features

  • Citation Extraction: Parses Wikipedia references into CSL-JSON format
  • Source Verification: Validates DOIs via doi.org, PMIDs via PubMed
  • Claim Mapping: Links factual claims to supporting citations with confidence scores
  • Entity Extraction: Identifies people, organizations, publications mentioned
  • Relationship Mapping: Detects collaborators, employers, affiliations
  • Timeline Construction: Builds chronological event sequences
  • Quality Assessment: Admiralty Code ratings (A-F) for source reliability
  • Uncertainty Flagging: Detects Wikipedia's {{citation needed}}, {{disputed}}, etc.

Installation

Claude Code (CLI)

# Copy to your skills directory
cp -r wikipedia-research ~/.claude/skills/

Project-level

# Add to your project
cp -r wikipedia-research .claude/skills/

Quick Start

from scripts.citation_extractor import CitationExtractor
from scripts.source_verifier import SourceVerifier
from scripts.entity_extractor import EntityExtractor

# Extract article with citations
extractor = CitationExtractor()
research = extractor.extract_article("Albert_Einstein")

# Verify citations
verifier = SourceVerifier()
verification = verifier.verify_citations(research['citations'])

# Extract entities and relationships
entity_extractor = EntityExtractor()
entities = entity_extractor.extract_entities(research)
timeline = entity_extractor.build_timeline(research)

Output Format

{
  "article": {
    "title": "Article Title",
    "url": "https://en.wikipedia.org/wiki/...",
    "revision_id": "1234567890"
  },
  "sections": [{
    "heading": "Section",
    "claims": [{
      "text": "Factual claim",
      "citation_ids": ["ref_1"],
      "confidence": 0.92
    }]
  }],
  "citations": [{
    "id": "ref_1",
    "type": "article-journal",
    "DOI": "10.1234/example",
    "PMID": "12345678"
  }],
  "verification": {
    "verification_score": 0.85,
    "reliability_assessment": "high"
  },
  "knowledge_graph": {
    "nodes": [...],
    "edges": [...],
    "timeline": [...]
  }
}

Confidence Scoring

Method: Additive heuristic based on citation metadata presence.

Factor Weight
Base score 0.50
DOI present +0.20
PMID present +0.15
ISBN present +0.10
URL present +0.05
Author info +0.10
Publication venue +0.05

Limitations: This scores metadata presence, not semantic verification. A citation existing doesn't guarantee it supports the specific claim.

Scripts

Script Purpose
wikipedia_client.py MediaWiki API client with caching
citation_extractor.py Extract & parse citations to CSL-JSON
research_collector.py Multi-article research orchestration
source_verifier.py Verify DOIs, PMIDs, detect dead links
entity_extractor.py Extract entities, relationships, timelines

License

Apache 2.0

Contributing

Contributions welcome! Please open an issue or PR.