OkraPDF Public Documents

v1.0.0

Query pre-extracted public documents via OkraPDF MCP — arxiv AI papers, SEC 10-K/10-Q filings, and more. Read, ask questions, extract structured data. No upl...

0· 111·0 current·0 all-time
bySteven Tsao@steventsao

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for steventsao/okra-public-docs.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "OkraPDF Public Documents" (steventsao/okra-public-docs) from ClawHub.
Skill page: https://clawhub.ai/steventsao/okra-public-docs
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install okra-public-docs

ClawHub CLI

Package manager switcher

npx clawhub@latest install okra-public-docs
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (querying pre-extracted ArXiv and SEC docs) matches the provided assets and instructions: a papers.json manifest and MCP configuration examples. No unrelated credentials, binaries, or install steps are requested.
Instruction Scope
SKILL.md confines runtime instructions to querying documents via MCP endpoints and optional public search APIs (Semantic Scholar, PapersWithCode). It asks the user to add MCP server config to client config files (~/.claude/mcp.json or .cursor/mcp.json) and to provide an OkraPDF API key only for the ArXiv channel — this is proportional to the stated functionality. It does not instruct scanning of unrelated local files or exfiltration.
Install Mechanism
No install spec or downloaded code is present (instruction-only), so nothing is written to disk or executed by the skill beyond guidance to configure a client to call external MCP endpoints.
Credentials
No required environment variables or hidden credentials are declared. The only credential mentioned is an optional OkraPDF API key for the ArXiv channel, which is appropriate for authenticated API usage; SEC filings are described as zero-auth.
Persistence & Privilege
Skill is not always-enabled and is user-invocable. It does not request persistent system privileges or modify other skills' configurations; the only persistent change it recommends is adding MCP entries to the user's agent/client config (which is a normal client-side setup step).
Assessment
This is an instruction-only connector that tells your agent how to use OkraPDF's MCP endpoints to query pre-indexed public documents. Before installing: 1) Confirm you trust api.okrapdf.com / mcp.okrapdf.com because the agent will send queries there; 2) only provide an OkraPDF API key if you accept that queries referencing the ArXiv collection will go to that service; 3) the skill asks you to edit client config files (~/.claude/mcp.json or .cursor/mcp.json) — review those edits before saving; 4) the SEC data is described as public and zero-auth but verify data quality if you rely on it for decisions. Nothing in the skill demands unrelated credentials or elevated system access.

Like a lobster shell, security has layers — review code before you run it.

latestvk977gbkc4sgrjn0spx30rnd4th83xprz
111downloads
0stars
1versions
Updated 4w ago
v1.0.0
MIT-0

OkraPDF Public Documents

Pre-extracted public document corpora queryable via MCP. No upload, no waiting — documents are already parsed and indexed. Just pass an ID and start asking questions.

Available Channels

ChannelCoverageAuthID Format
Arxiv AI papers400+ papers from cs.AI, cs.CL, cs.LG (updated weekly)API key requiredarxiv:2603.26653
SEC filingsMag7 + FinanceBench (~80 companies), 10-K and 10-QNo auth neededTicker-based (NVDA)

Setup

Arxiv papers (authenticated MCP)

Add to ~/.claude/mcp.json (Claude Code) or .cursor/mcp.json (Cursor):

{
  "mcpServers": {
    "okra-pdf": {
      "type": "url",
      "url": "https://api.okrapdf.com/mcp",
      "headers": { "Authorization": "Bearer YOUR_API_KEY" }
    }
  }
}

Get a free API key at okrapdf.com (Settings > API Keys).

SEC filings (zero-auth MCP)

{
  "mcpServers": {
    "okra-sec": {
      "type": "url",
      "url": "https://mcp.okrapdf.com/mcp"
    }
  }
}

No API key, no signup. Restart your agent after adding.


Arxiv Papers

400+ recent AI research papers parsed with Docling OCR on GPU — tables, equations, figures, and full text preserved as structured markdown.

Read a paper

read_document(document_id: "arxiv:2603.26653")
read_document(document_id: "arxiv:2603.26653", pages: "1-5")
read_document(document_id: "https://arxiv.org/pdf/2603.26653")

No upload needed — papers are pre-indexed as public sources. Just pass the arxiv ID.

Ask questions

ask_document(document_id: "arxiv:2603.26653", question: "What is the main contribution?")
ask_document(document_id: "arxiv:2603.26653", question: "What were the benchmark results on MMLU?")
ask_document(document_id: "arxiv:2603.18272", question: "How is retrieval-augmented experience used?")

Returns answer with page citations.

Extract structured data

extract_data(
  document_id: "arxiv:2603.26653",
  prompt: "Extract all benchmark results with model names, dataset names, and scores",
  json_schema: {
    "type": "object",
    "properties": {
      "benchmarks": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "model": {"type": "string"},
            "dataset": {"type": "string"},
            "metric": {"type": "string"},
            "score": {"type": "number"}
          }
        }
      }
    }
  }
)

Literature survey workflow

# Read abstracts from several papers
read_document(document_id: "arxiv:2603.26499", pages: "1")
read_document(document_id: "arxiv:2603.26266", pages: "1")

# Ask targeted questions
ask_document(document_id: "arxiv:2603.26499", question: "What bottlenecks in AI research does this address?")

# Same question across papers for comparison
ask_document(document_id: "arxiv:2603.18272", question: "How does this handle multi-agent coordination?")
ask_document(document_id: "arxiv:2603.07379", question: "How does this handle multi-agent coordination?")

Discover papers

Semantic Scholar (free, no key needed for basic use):

curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=agentic+RAG&year=2026&fields=externalIds,title,citationCount&limit=10" \
  | jq '.data[] | {arxiv: .externalIds.ArXiv, title, citations: .citationCount}'

Arxiv RSS feeds (same feeds used to build the collection):

https://rss.arxiv.org/rss/cs.AI    # Artificial Intelligence
https://rss.arxiv.org/rss/cs.CL    # Computation and Language (NLP)
https://rss.arxiv.org/rss/cs.LG    # Machine Learning

Papers With Code:

curl -s "https://paperswithcode.com/api/v1/papers/?q=agentic+RAG&items_per_page=5" | jq '.results[] | {title, arxiv_id}'

Current snapshot

411 papers from cs.AI (~200), cs.CL (~100), cs.LG (~200). Full manifest in papers.json.

If a paper isn't found, upload it yourself with upload_document.

Tips

  • Use arxiv:XXXX.XXXXX format (not full URL) for cleaner queries
  • pages: "1" reads just the abstract/intro quickly
  • For survey papers (50+ pages), use ask_document instead of reading everything
  • extract_data with JSON schemas is ideal for pulling benchmark tables

SEC Filings

Pre-extracted SEC 10-K and 10-Q filings. No API key, no signup, completely free.

Available tools

ToolPurpose
read_filing_indexBrowse available filings, filter by ticker/type
read_filing_contentsGet full extracted text as markdown
ask_questionAI-powered Q&A with citations, single or cross-company
get_verification_summaryCheck extraction quality page-by-page
verify_pagesApprove or flag pages for quality control

Browse filings

read_filing_index()
read_filing_index(ticker: "NVDA")
read_filing_index(ticker: "AAPL", filing_type: "10-K")

Always start here to see what's available.

Ask questions (single company)

ask_question(question: "What was NVIDIA's data center revenue?", tickers: ["NVDA"])
ask_question(question: "List all risk factors related to AI regulation", tickers: ["MSFT"])
ask_question(question: "What are the outstanding debt obligations?", tickers: ["TSLA"], filing: "10-k-2024")

Cross-company comparison (up to 10 tickers)

ask_question(
  question: "Compare R&D spending as a percentage of revenue",
  tickers: ["AAPL", "MSFT", "GOOGL", "NVDA", "META", "AMZN", "TSLA"]
)

ask_question(
  question: "Which company has the highest gross margin?",
  tickers: ["AAPL", "MSFT", "GOOGL"]
)

ask_question(
  question: "Summarize each company's AI strategy",
  tickers: ["NVDA", "AMD", "INTC"]
)

Fans out to each company's filing in parallel, then synthesizes a cross-company answer.

Read full filing text

read_filing_contents(ticker: "TSLA", filing: "10-k-2024")

Filing slug formats (all equivalent): 10-k-2024, 10-K/2024, 2024-10K.

Extraction quality audit

get_verification_summary(document_id: "doc-xxx")
get_verification_summary(document_id: "doc-xxx", status: "needs_review")
verify_pages(document_id: "doc-xxx", action: "approve", confidence_above: 0.9)
verify_pages(document_id: "doc-xxx", action: "flag", pages: [67], reason: "Table has merged cells")

Available companies

Mag7: AAPL, MSFT, GOOGL, AMZN, NVDA, META, TSLA

FinanceBench: ~80 companies including major banks (JPM, BAC, GS), pharma (PFE, JNJ, ABBV), industrials (GE, MMM, CAT), and more.

Use read_filing_index() to browse the full catalog. New filings added as published.

Tips

  • Start with read_filing_index before querying
  • ask_question with multiple tickers is the fastest way to compare — no need to read each filing
  • Cross-company queries work best with clear, quantitative questions
  • Verification tools require document_id (not ticker) — get it from other tool responses

Comments

Loading comments...