OpenCode/OpenWork OCR Skill - Hybrid Mode: DeepSeek-OCR 3B (smart, custom prompts) + PaddleOCR (fast). 双模式 OCR:智能模式支持自定义 prompt,快速模式纯文字提取。
npx skills add https://github.com/mr-shaper/opencode-skills-paddle-ocr --skill ocrInstallez cette compétence avec la CLI et commencez à utiliser le flux de travail SKILL.md dans votre espace de travail.
Give "eyes" to text-only LLMs — Local OCR for OpenCode/OpenWork
Many LLMs (like GLM-4.7) don't have vision capabilities but are affordable and fast. This skill gives them "eyes" by running OCR locally, extracting text from images so any model can understand visual content.
All data stays on your machine — perfect for sensitive documents.
| Mode | Engine | Best For | Speed |
|---|---|---|---|
| Smart (default) | DeepSeek-OCR 3B | Custom prompts, understanding content | 10-30s |
Fast (--fast) |
PaddleOCR PP-OCRv5 | Pure text extraction | 1-3s |
# Install Ollama
brew install ollama
brew services start ollama
# Download DeepSeek-OCR model (~6.7GB)
ollama pull deepseek-ocr
# Python dependencies
pip install requests pdf2image Pillow
brew install poppler
# (Optional) Fast mode
pip install paddleocr paddlepaddle
cd ~/Library/Application\ Support/com.differentai.openwork/workspaces/starter/.opencode/skills
git clone https://github.com/mr-shaper/opencode-skill-hybrid-ocr.git paddle-ocr
cd paddle-ocr
# Smart mode (DeepSeek-OCR)
python3 scripts/ocr.py image.png
# Custom prompt
python3 scripts/ocr.py table.png --prompt "Extract as markdown table"
# Fast mode (PaddleOCR)
python3 scripts/ocr.py image.png --fast
# PDF
python3 scripts/ocr.py document.pdf
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Image/PDF │────▶│ Local OCR │────▶│ Your LLM │
│ │ │ (DeepSeek/Paddle)│ │ (GLM-4.7, etc.) │
└─────────────┘ └──────────────────┘ └─────────────────┘
| State | Memory |
|---|---|
| Idle | ~30MB (Ollama service) |
| Processing | ~6-8GB (model loaded) |
Free memory when not needed:
brew services stop ollama
MIT
Made for OpenCode community