Paper Ingest Normalizer

v1.0.0

Normalize papers, PDFs, URLs, and literature notes into structured research records for project memory and retrieval. Use when: (1) a new paper, PDF, DOI, or...

0· 104·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for sunbinnju-star/paper-ingest-normalizer.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Paper Ingest Normalizer" (sunbinnju-star/paper-ingest-normalizer) from ClawHub.
Skill page: https://clawhub.ai/sunbinnju-star/paper-ingest-normalizer
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install paper-ingest-normalizer

ClawHub CLI

Package manager switcher

npx clawhub@latest install paper-ingest-normalizer
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the requested inputs and outputs: it asks for PDFs, URLs, raw text, or metadata and specifies a structured schema and writeback behavior. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
Instructions stay within the paper-normalization scope (extract text from PDF/URL/text, parse bibliographic and research fields, assemble record). It correctly warns not to write to project memory without project_id. Note: it relies on access to local PDF paths or URLs — giving the agent those paths/links grants it access to those documents, so users should avoid supplying sensitive files unless intended.
Install Mechanism
Instruction-only skill with no install spec or code files. It suggests using pdfplumber/PyMuPDF or a separate 'summarize' skill for extraction, which is appropriate and low-risk for an instruction-only skill.
Credentials
No environment variables, credentials, or config paths are requested; the declared inputs (pdf_path, url, raw_text, project_id) are proportionate to the task.
Persistence & Privilege
Does not request always:true and does not change other skills or system-wide settings. The documented rule to never write without project_id reduces risk of unintended persistence.
Assessment
This skill appears to do what it claims: normalize literature into structured records and avoid writes unless project_id is provided. Before installing/using: (1) only supply PDFs or URLs you want the agent to read (do not give sensitive files), (2) confirm the agent has access to any external 'summarize' skill or PDF libraries it will call and that you trust those components, and (3) verify writeback behavior in a safe test project to ensure records are written only when writeback_ready is true and project_id is set.

Like a lobster shell, security has layers — review code before you run it.

latestvk979tpsfsd8am7bd5mywjg0xhs83p1rf
104downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

Paper Ingest Normalizer

Convert raw literature inputs into standardized records safe for project memory, paper databases, and downstream synthesis pipelines.

Input

One of the following is required:

  • pdf_path — local path to PDF file
  • url — link to paper/article
  • raw_text — extracted or pasted text
  • metadata_blob — existing metadata dict

Plus:

  • project_id — required for any writeback
  • source_type — one of: pdf, doi, url, text, metadata
  • optional tags — list of strings for categorization

Output Schema

Return a structured object:

title: string
authors: string[] | null
year: number | null
source: string          # journal, conference, preprint, etc.
doi_or_url: string | null
project_id: string
paper_type: string      # experimental, theoretical, review, etc.
material_system: string | null   # e.g. "钙钛矿太阳能电池", " graphene FET"
device_type: string | null       # e.g. "FTO/glass", "flexible substrate"
key_variables: string[] | null   # independent variables studied
key_metrics: string[] | null     # measured outcomes (PCE, mobility, etc.)
core_findings: string            # 2-3 sentence neutral summary
claimed_mechanism: string | null
limitations: string | null
normalized_summary: string       # 1-2 paragraph structured summary
uncertain_fields: string[] | null  # fields that could not be verified
writeback_ready: boolean        # true only if key identity fields present
writeback_payload: object        # the record to write into project memory

Rules

  1. Never write into project memory without project_id. Ask if not provided.
  2. Separate direct observations from claimed interpretations. Mark inference vs. direct extraction.
  3. Preserve uncertainty. Use null for missing fields; list in uncertain_fields.
  4. Do not invent missing bibliographic fields. Don't hallucinate authors, year, etc.
  5. Do not over-claim. Keep core_findings and normalized_summary grounded in what the text actually says.
  6. Never conflate abstract with findings. The abstract states intentions; findings are what the data supports.
  7. If writeback_ready = false, list explicitly which fields are missing and why.

PDF Extraction

For PDFs, use the summarize skill or pdfplumber/PyMuPDF to extract text before processing.

Workflow

  1. Identify source type — determine which input field is populated
  2. Extract raw content — PDF text, URL content, or use provided raw text
  3. Parse bibliographic fields — title, authors, year, source, DOI
  4. Identify research content — material system, device type, variables, metrics
  5. Distill findings — separate what was measured from what was claimed
  6. Assemble writeback_payload — structured record matching the schema above
  7. Assess completeness — set writeback_ready based on presence of key identity fields

Failure Handling

If parsing is incomplete:

  • Return partial structured output with all successfully extracted fields
  • Populate uncertain_fields with the list of fields that could not be determined
  • Set writeback_ready = false when title, authors, or year are missing

Cross-Reference

For synthesis after normalization, see the research skill for paper synthesis workflows.

Comments

Loading comments...