PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
npx skills add https://github.com/technillogue/ptx-isa-markdown --skill cudaInstala esta habilidad con la CLI y comienza a usar el flujo de trabajo SKILL.md en tu espacio de trabajo.
NVIDIA's PTX ISA 9.1, CUDA Runtime API 13.1, and CUDA Driver API 13.1 documentation converted to searchable markdown, with a Claude Code skill for GPU development.
PTX ISA 9.1 Documentation (405 markdown files, 2.3MB)
CUDA Runtime API 13.1 Documentation (107 markdown files, 0.9MB)
CUDA Driver API 13.1 Documentation (128 markdown files, 0.8MB)
CUDA Development Skill for Claude Code
NVIDIA's official documentation is:
This conversion enables:
grep -r "register fragment" ptx-docs/ instead of Ctrl+Fgrep -r "cudaErrorInvalidValue" cuda-runtime-docs/ instead of clicking through modulesgrep -r "cuCtxCreate" cuda-driver-docs/ for low-level API lookupExample 1: Find how to disable TMA swizzling:
$ grep -r "swizzle_mode.*no swizzling" cuda_skill/references/ptx-docs/
9.7.9.28-data-movement-and-conversion-instructionstensormapreplace.md:
0 | `.u8` | No interleave | No swizzling | 16B | Zero fill
Answer: use tensormap.replace with .swizzle_mode = 0.
Example 2: Look up what cudaErrorInvalidValue means:
$ grep -A 5 "cudaErrorInvalidValue" cuda_skill/references/cuda-runtime-docs/
Answer: Instantly find error code documentation with description and related errors.
Example 3: Understand context management in Driver API:
$ grep -A 20 "cuCtxCreate" cuda_skill/references/cuda-driver-docs/modules/group__cuda__ctx.md
Answer: Full cuCtxCreate parameters, return values, and context behavior.
cuda_skill/ # Portable Claude Code skill (~4.2MB)
├── SKILL.md # Main skill definition
└── references/
├── ptx-docs/ # 405 markdown files (2.3MB)
│ ├── 9-instruction-set/ # 186 instruction files
│ │ ├── 9.7.15.5-*.md # WGMMA register layouts
│ │ └── 9.7.16-*.md # TensorCore Gen5 (Blackwell)
│ ├── 5-state-spaces-types-and-variables/
│ ├── 8-memory-consistency-model/
│ └── INDEX.md
├── cuda-runtime-docs/ # 107 markdown files (0.9MB)
│ ├── modules/ # 41 API modules
│ │ ├── group__cudart__device.md
│ │ ├── group__cudart__memory.md
│ │ ├── group__cudart__stream.md
│ │ └── ...
│ ├── data-structures/ # 66 structs/unions
│ │ ├── structcudadeviceprop.md
│ │ ├── structcudamemcpy3dparms.md
│ │ └── ...
│ └── INDEX.md
├── cuda-driver-docs/ # 128 markdown files (0.8MB)
│ ├── modules/ # 50 API modules
│ │ ├── group__cuda__ctx.md
│ │ ├── group__cuda__mem.md
│ │ ├── group__cuda__stream.md
│ │ ├── group__cuda__module.md
│ │ ├── group__cuda__va.md
│ │ └── ...
│ ├── data-structures/ # 80 structs
│ │ ├── structcudevprop__v1.md
│ │ ├── structcuda__memcpy3d__v2.md
│ │ └── ...
│ └── INDEX.md
├── ptx-isa.md # PTX search guide and examples
├── cuda-runtime.md # Runtime API search guide
├── cuda-driver.md # Driver API search guide
├── nsys-guide.md # Nsight Systems patterns
├── ncu-guide.md # Nsight Compute metrics
└── debugging-tools.md # compute-sanitizer, cuda-gdb
scrape_ptx_docs.py # Regenerate PTX docs (uv script)
scrape_cuda_runtime.py # Regenerate Runtime docs (uv script)
scrape_cuda_driver.py # Regenerate Driver docs (uv script)
Install:
cp -r cuda_skill ~/.claude/skills/cuda
The skill activates automatically for CUDA work. Ask Claude:
Claude searches the local documentation and provides answers with section references.
Find WGMMA register fragments:
grep -r "register fragment" cuda_skill/references/ptx-docs/9-instruction-set/ | grep -i wgmma
Find all swizzling modes:
find cuda_skill/references/ptx-docs -name "*swizzl*"
Search for any instruction:
grep -r "mbarrier.init" cuda_skill/references/ptx-docs/
Look up error code:
grep -A 10 "cudaErrorInvalidValue" cuda_skill/references/cuda-runtime-docs/
Find device properties:
cat cuda_skill/references/cuda-runtime-docs/data-structures/structcudadeviceprop.md
Look up context creation:
grep -A 20 "cuCtxCreate" cuda_skill/references/cuda-driver-docs/modules/group__cuda__ctx.md
Find virtual memory management:
ls cuda_skill/references/cuda-driver-docs/modules/*va*.md
cat cuda_skill/references/cuda-driver-docs/modules/group__cuda__va.md
Understand module loading:
grep -A 15 "cuModuleLoad" cuda_skill/references/cuda-driver-docs/modules/group__cuda__module.md
Search for stream functions:
grep -r "cudaStreamSynchronize" cuda_skill/references/cuda-runtime-docs/
Run the scrapers to update from NVIDIA's latest docs:
# Update PTX ISA docs
./scrape_ptx_docs.py
# Update CUDA Runtime API docs (2-step process for backward compatibility)
./scrape_cuda_runtime.py # 1. Download and convert
python3 cleanup_cuda_docs.py # 2. Remove redundant content
# Update CUDA Driver API docs (integrated pipeline with caching)
./scrape_cuda_driver.py # Download, cache, and cleanup automatically
# Fast iteration on cleanup (uses cached raw files):
./scrape_cuda_driver.py --skip-download
All scrapers use uv for dependency management (beautifulsoup4, html2text, requests).
PTX scraper:
Runtime API scraper:
Driver API scraper:
cuda-driver-docs-raw/ for fast iteration--skip-download to re-run cleanup without re-downloadingAPI cleanup process:
[inherited] tags, zero-width spacesPTX ISA: Tables verified against HTML source. Mathematical notation preserved. 1049 images accessible via NVIDIA CDN links. See docs/QUALITY_REPORT.md for detailed verification results.
CUDA Runtime API: All 107 pages successfully converted. Function signatures, parameters, return values, and descriptions fully preserved. Type and function names preserved for grep-based navigation. Navigation, duplicate content, redundant URLs, generic boilerplate notes, and formatting noise removed for cleaner, more focused documentation (83% size reduction).
CUDA Driver API: All 128 pages (130 total minus 2 404 errors) successfully converted. Function signatures, parameters, return values, and descriptions fully preserved. Type and function names preserved for grep-based navigation. Duplicate function TOC, verbose "See also" sections, redundant URLs, navigation, footer, and formatting noise removed (76% size reduction).
Verification:
Known limitations:
PTX ISA:
CUDA Runtime API:
CUDA Driver API:
License: Documentation © NVIDIA Corporation
The skill uses Claude Code's progressive disclosure: SKILL.md is always loaded (~13KB), reference files load on-demand, and documentation is searched rather than loaded into context.
Total skill size: ~4.2MB (2.3MB PTX + 0.9MB Runtime + 0.8MB Driver + guides)
Note: After scraping, the raw cache directory cuda-driver-docs-raw/ (3.7MB) can be removed from the skill if space is a concern. It's only needed for iterating on cleanup logic.
cuobjdump -ptx analysisUnofficial conversion for convenience. Refer to NVIDIA's official documentation for authoritative reference: