Paper Ingest Normalizer

v1.0.0

Normalize papers, PDFs, URLs, and literature notes into structured research records for project memory and retrieval. Use when: (1) a new paper, PDF, DOI, or...

⭐ 0· 104·0 current·0 all-time

by@sunbinnju-star

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for sunbinnju-star/paper-ingest-normalizer.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Paper Ingest Normalizer" (sunbinnju-star/paper-ingest-normalizer) from ClawHub.
Skill page: https://clawhub.ai/sunbinnju-star/paper-ingest-normalizer
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install paper-ingest-normalizer

ClawHub CLI

Package manager switcher

npx clawhub@latest install paper-ingest-normalizer

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description match the requested inputs and outputs: it asks for PDFs, URLs, raw text, or metadata and specifies a structured schema and writeback behavior. No unrelated credentials, binaries, or config paths are requested.

ℹ

Instruction Scope

Instructions stay within the paper-normalization scope (extract text from PDF/URL/text, parse bibliographic and research fields, assemble record). It correctly warns not to write to project memory without project_id. Note: it relies on access to local PDF paths or URLs — giving the agent those paths/links grants it access to those documents, so users should avoid supplying sensitive files unless intended.

✓

Install Mechanism

Instruction-only skill with no install spec or code files. It suggests using pdfplumber/PyMuPDF or a separate 'summarize' skill for extraction, which is appropriate and low-risk for an instruction-only skill.

✓

Credentials

No environment variables, credentials, or config paths are requested; the declared inputs (pdf_path, url, raw_text, project_id) are proportionate to the task.

✓

Persistence & Privilege

Does not request always:true and does not change other skills or system-wide settings. The documented rule to never write without project_id reduces risk of unintended persistence.

Assessment

This skill appears to do what it claims: normalize literature into structured records and avoid writes unless project_id is provided. Before installing/using: (1) only supply PDFs or URLs you want the agent to read (do not give sensitive files), (2) confirm the agent has access to any external 'summarize' skill or PDF libraries it will call and that you trust those components, and (3) verify writeback behavior in a safe test project to ensure records are written only when writeback_ready is true and project_id is set.

Like a lobster shell, security has layers — review code before you run it.

latestvk979tpsfsd8am7bd5mywjg0xhs83p1rf

104downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Paper Ingest Normalizer

Convert raw literature inputs into standardized records safe for project memory, paper databases, and downstream synthesis pipelines.

Input

One of the following is required:

pdf_path — local path to PDF file
url — link to paper/article
raw_text — extracted or pasted text
metadata_blob — existing metadata dict

Plus:

project_id — required for any writeback
source_type — one of: pdf, doi, url, text, metadata
optional tags — list of strings for categorization

Output Schema

Return a structured object:

title: string
authors: string[] | null
year: number | null
source: string          # journal, conference, preprint, etc.
doi_or_url: string | null
project_id: string
paper_type: string      # experimental, theoretical, review, etc.
material_system: string | null   # e.g. "钙钛矿太阳能电池", " graphene FET"
device_type: string | null       # e.g. "FTO/glass", "flexible substrate"
key_variables: string[] | null   # independent variables studied
key_metrics: string[] | null     # measured outcomes (PCE, mobility, etc.)
core_findings: string            # 2-3 sentence neutral summary
claimed_mechanism: string | null
limitations: string | null
normalized_summary: string       # 1-2 paragraph structured summary
uncertain_fields: string[] | null  # fields that could not be verified
writeback_ready: boolean        # true only if key identity fields present
writeback_payload: object        # the record to write into project memory

Rules

Never write into project memory without project_id. Ask if not provided.
Separate direct observations from claimed interpretations. Mark inference vs. direct extraction.
Preserve uncertainty. Use null for missing fields; list in uncertain_fields.
Do not invent missing bibliographic fields. Don't hallucinate authors, year, etc.
Do not over-claim. Keep core_findings and normalized_summary grounded in what the text actually says.
Never conflate abstract with findings. The abstract states intentions; findings are what the data supports.
If writeback_ready = false, list explicitly which fields are missing and why.

PDF Extraction

For PDFs, use the summarize skill or pdfplumber/PyMuPDF to extract text before processing.

Workflow

Identify source type — determine which input field is populated
Extract raw content — PDF text, URL content, or use provided raw text
Parse bibliographic fields — title, authors, year, source, DOI
Identify research content — material system, device type, variables, metrics
Distill findings — separate what was measured from what was claimed
Assemble writeback_payload — structured record matching the schema above
Assess completeness — set writeback_ready based on presence of key identity fields

Failure Handling

If parsing is incomplete:

Return partial structured output with all successfully extracted fields
Populate uncertain_fields with the list of fields that could not be determined
Set writeback_ready = false when title, authors, or year are missing

Cross-Reference

For synthesis after normalization, see the research skill for paper synthesis workflows.

Comments

Loading comments...