PaddleOCR-VL

Other

GPU-accelerated document parsing and OCR via PaddleOCR-VL. Detects layout, recognizes Chinese/English text, tables, charts, and seals in images. Use when the user asks to OCR an image, extract text from a document, parse a screenshot, or recognize text in photos. Supports vertical classical Chinese text, modern newspaper layouts, and mixed-content documents.

Install

openclaw skills install paddle-ocr-vl

PaddleOCR-VL Skill

GPU-accelerated document OCR using the PaddleOCR-VL model running inside an ephemeral Docker container. Auto-detects NVIDIA GPU architecture (Blackwell SM120 vs. standard) and selects the correct official image.

When to Use

  • User provides an image path and asks to "read", "OCR", or "extract text" from it
  • User wants to parse a document screenshot, newspaper page, or classical text
  • User asks about the content of an image file

Architecture

This skill includes an MCP server (server.py) that exposes three tools:

ToolPurpose
run_ocrOCR any image — provide an absolute path
check_environmentVerify Docker, GPU drivers, and image are ready
run_demoRun OCR on bundled demo images to test the setup

Setup

1. Install the MCP Server

Add to ~/.config/Claude/claude_desktop_config.json (Claude Desktop) or ~/.claude/settings.json (Claude Code):

{
  "mcpServers": {
    "paddle-ocr-vl": {
      "command": "python3",
      "args": ["<INSTALL_DIR>/server.py"]
    }
  }
}

2. Pull the Docker Image (one-time)

# Blackwell GPU (RTX 50xx, B100/B200 — compute capability >= 12.0):
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu-sm120

# Other NVIDIA GPU:
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-nvidia-gpu

3. Verify

Call check_environment to verify everything is set up, then run_demo to test on the bundled sample images.

Bundled Demo Images

FileContent
demo/newspaper.pngPeople's Daily article about China-Eritrea relations
demo/classical_text.pngRecords of the Three Kingdoms, vertical classical Chinese

Requirements

  • Docker with nvidia-container-toolkit
  • NVIDIA GPU with drivers installed
  • Python 3.10+ with mcp SDK (pip install mcp)

Security & Privacy

  • Images are processed inside an ephemeral Docker container (--rm flag)
  • The container has no network access beyond --network host (needed for GPU)
  • No data leaves the host machine
  • The container is destroyed immediately after each OCR run

External Endpoints

None. All processing is local.

Official References