📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable.
npx skills add https://github.com/majcheradam/ocrbase --skill react-best-practicesInstala esta habilidad con la CLI y comienza a usar el flujo de trabajo SKILL.md en tu espacio de trabajo.
📄 ocrbase is a lightweight, model-agnostic API that standardizes document parsing across visual language models (VLMs).
🪶 Lightweight: Tiny Bun + Elysia service, single container, minimal footprint.
🔌 Model-Agnostic: Point at any supported VLM — GLM-OCR, PaddleOCR-VL — via env vars.
📊 State of the Art: Backed by models scoring ≥94.5 on OmniDocBench v1.5.
💎 Easy to Deploy: One command away from a working OCR API.
/v1/parse — turn a document into text/v1/parse/async — enqueue a parse job/v1/extract — extract structured JSON from a document/v1/extract/async — enqueue an extract job/v1/job/:jobId — inspect parse or extract job statusBoth models are state of the art:
[!IMPORTANT]
ocrbase does not ship the models — point it at a running inference server:
docker run -d -p 3000:3000 \
-e PADDLEOCR_URL=http://localhost:8190 \
-e GLM_OCR_URL=http://localhost:5002 \
--name ocrbase ghcr.io/ocrbase-hq/ocrbase
bun install
bun dev
If S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_BUCKET, and S3_ENDPOINT are set, /v1/parse will:
File inputs to S3GET URL into the selected document modelIf those env vars are not set, ocrbase keeps the current direct behavior and sends the original input to the model.
If REDIS_URL and the S3 env vars above are set, queue mode is enabled:
POST /v1/parse uploads or normalizes the input to S3, enqueues a parse job, waits for completion, and returns the normal parse responsePOST /v1/parse/async returns 202 { jobId }GET /v1/job/:jobId returns the job state plus result or errorIf Redis is missing, or Redis is present but S3 is not fully configured, POST /v1/parse keeps the existing direct behavior and the async/status endpoints return 503.
When queue mode is enabled, Bull Board is also available at /v1/admin/queues.