semantic-scholar

Search, retrieve, and organize scholarly metadata with the Semantic Scholar APIs. Use when Codex needs to find papers or authors, build paper sets from compl...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 18 · 0 current installs · 0 all-time installs

bySiyu Liu@Grenzlinie

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description match the included scripts and references: Graph API, Recommendations API, Datasets API workflows are implemented or documented. All request URLs point to api.semanticscholar.org and the scripts implement search, batch fetch, recommendations, and dataset guidance—functions consistent with the description.

✓

Instruction Scope

SKILL.md and the scripts limit operations to calling Semantic Scholar endpoints, writing JSONL/CSV outputs, and handling retries/pagination. The instructions do not ask the agent to read unrelated host files or exfiltrate secrets. The scripts preserve raw output before flattening as recommended.

ℹ

Install Mechanism

There is no registered install spec (skill is effectively delivered as code). The Python scripts declare typical Python deps (requests, optional pandas) but the registry metadata doesn't declare dependency installation; the scripts themselves include comments like 'pip install requests pandas'. This is not malicious but means dependencies must be installed manually. The smoke-test script expects an unlisted 'uv' CLI tool to exist (see environment note).

ℹ

Credentials

Scripts optionally read SEMANTIC_SCHOLAR_API_KEY via environment; that is expected and proportionate for an API client. Registry metadata lists no required env vars; this is acceptable because the API key is optional in code, but users should be aware the key increases rate limits. No unrelated credentials or secrets are requested.

✓

Persistence & Privilege

Skill does not request always:true, does not modify other skills or global agent config, and only writes its own output files. Autonomous invocation is allowed by default (platform normal) but combined with the rest of the footprint does not introduce unusual privilege.

Assessment

This skill appears to do what it says: query Semantic Scholar and save results. Before installing/running: (1) Review and install Python dependencies (requests; pandas only if you need CSV export). The skill has no automated install step. (2) If you want higher rate limits or repeated/bulk jobs, set SEMANTIC_SCHOLAR_API_KEY in your environment; the scripts look for x-api-key but will run without it (with lower limits). (3) The provided smoke-test expects an external 'uv' command (not documented in registry metadata); you don't need to run the smoke-test if 'uv' is unavailable. (4) The scripts perform network requests to api.semanticscholar.org and write JSONL/CSV files to disk—inspect outputs and running directory before sharing them. (5) If you plan to use the Datasets API or large bulk downloads, confirm storage/bandwidth expectations first. Overall: coherent and consistent with the stated purpose, but install/runtime dependencies should be manually verified before execution.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk970pxgcds6rves0qfchw23415830frh

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Semantic Scholar

Overview

Choose the right Semantic Scholar API workflow before writing code or issuing requests. Prefer small, field-scoped online calls for interactive search, paper/search/bulk for large retrieval jobs, recommendations when the user already has seed papers, and datasets only for offline or release-based data pulls.

Workflow Decision Tree

Start by classifying the task:

Use the Graph API for live paper or author lookup, metadata retrieval, query refinement, and batch fetches by known IDs.
Use the Recommendations API when the user already has one or more relevant papers and wants similar or related work.
Use the Datasets API when the user needs offline snapshots, release-to-release diffs, or corpus-scale ingestion rather than interactive search.

Then choose the endpoint pattern:

Use paper/search for normal interactive search, smaller result sets, ranking, and iterative query tuning.
Use paper/search/bulk for large result collection; it uses continuation-token pagination and is the default for broad literature harvesting.
Use paper/batch or author/batch when the user already has IDs and wants metadata efficiently.
Use author/search for author discovery by name or affiliation-like clues.
Use recommendations for "papers like this one" workflows.

Operating Rules

Request only the fields needed for the task. Semantic Scholar explicitly supports field projection; smaller field lists are faster and less error-prone.
Prefer API key authentication via SEMANTIC_SCHOLAR_API_KEY when available, especially for repeated or larger jobs.
Handle pagination explicitly. paper/search and author/search are interactive search flows; paper/search/bulk uses continuation tokens.
Add retry and backoff for 429 and transient 5xx responses.
Preserve raw results before flattening or post-processing them.
For broad discovery, write Boolean-rich queries instead of a single brittle phrase. Use exact phrases only when the user asks for them.
Do not route normal search tasks to Datasets API. Use datasets only when the user truly needs offline release files or diffs.

Typical Workflows

Search papers interactively

Use this for "find papers about X", "search by title keywords", or "filter by year/citations/open access".

Start with paper/search if the user expects inspection and refinement.
Keep fields minimal.
If the search must collect many records, switch to paper/search/bulk.
Read references/query-recipes.md for query patterns.
Read references/graph-api.md for endpoint details.

Harvest a broad paper set

Use this for literature review corpora, screening spreadsheets, or downstream ranking.

Prefer scripts/semantic_scholar_bulk_search.py.
Save raw output to JSONL and only then export CSV if the user needs tabular review.
Expose query, year filter, sort, and field selection as parameters instead of hardcoding them.

Fetch by known IDs

Use paper/batch or author/batch when IDs are already known from previous steps or user input.

Batch fetch is usually better than repeated single-record lookups.
Ask for only the fields required for the analysis or export.

Expand from seed papers

Use recommendations when the user says things like "find papers similar to this", "expand from these seed papers", or "build a related-work set".

Use the Recommendations API instead of trying to approximate similarity with a new keyword query.
Keep the seed-paper IDs and result set separate from keyword-search results so provenance stays clear.
Read references/recommendations-api.md.

Pull datasets or release diffs

Use the Datasets API only for offline ingestion or change tracking between releases.

Read references/datasets-api.md.
Confirm storage expectations before downloading large files.
Document the exact release identifiers used in the workflow.

References

Read references/api-overview.md first when deciding which API family to use.
Read references/graph-api.md for paper and author search, batch, pagination, and field projection.
Read references/recommendations-api.md for related-paper workflows.
Read references/datasets-api.md for release, download, and diff workflows.
Read references/query-recipes.md for Boolean query templates and search heuristics.

Script

Use scripts/semantic_scholar_bulk_search.py for broad paper harvesting with retries, continuation-token pagination, JSONL output, and optional CSV export.
Use scripts/paper_batch_fetch.py to fetch paper metadata in batches from known paper IDs.
Use scripts/author_batch_fetch.py to fetch author metadata in batches from known author IDs.
Use scripts/recommend_papers.py to retrieve recommendation results from seed paper IDs.
Use scripts/smoke_test_live.sh to run one live end-to-end smoke test across all four API scripts.

Files

12 total

Select a file

Select a file to preview.

Comments

Loading comments…