semantic-scholar

Search, retrieve, and organize scholarly metadata with the Semantic Scholar APIs. Use when Codex needs to find papers or authors, build paper sets from compl...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 18 · 0 current installs · 0 all-time installs
bySiyu Liu@Grenzlinie
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included scripts and references: Graph API, Recommendations API, Datasets API workflows are implemented or documented. All request URLs point to api.semanticscholar.org and the scripts implement search, batch fetch, recommendations, and dataset guidance—functions consistent with the description.
Instruction Scope
SKILL.md and the scripts limit operations to calling Semantic Scholar endpoints, writing JSONL/CSV outputs, and handling retries/pagination. The instructions do not ask the agent to read unrelated host files or exfiltrate secrets. The scripts preserve raw output before flattening as recommended.
Install Mechanism
There is no registered install spec (skill is effectively delivered as code). The Python scripts declare typical Python deps (requests, optional pandas) but the registry metadata doesn't declare dependency installation; the scripts themselves include comments like 'pip install requests pandas'. This is not malicious but means dependencies must be installed manually. The smoke-test script expects an unlisted 'uv' CLI tool to exist (see environment note).
Credentials
Scripts optionally read SEMANTIC_SCHOLAR_API_KEY via environment; that is expected and proportionate for an API client. Registry metadata lists no required env vars; this is acceptable because the API key is optional in code, but users should be aware the key increases rate limits. No unrelated credentials or secrets are requested.
Persistence & Privilege
Skill does not request always:true, does not modify other skills or global agent config, and only writes its own output files. Autonomous invocation is allowed by default (platform normal) but combined with the rest of the footprint does not introduce unusual privilege.
Assessment
This skill appears to do what it says: query Semantic Scholar and save results. Before installing/running: (1) Review and install Python dependencies (requests; pandas only if you need CSV export). The skill has no automated install step. (2) If you want higher rate limits or repeated/bulk jobs, set SEMANTIC_SCHOLAR_API_KEY in your environment; the scripts look for x-api-key but will run without it (with lower limits). (3) The provided smoke-test expects an external 'uv' command (not documented in registry metadata); you don't need to run the smoke-test if 'uv' is unavailable. (4) The scripts perform network requests to api.semanticscholar.org and write JSONL/CSV files to disk—inspect outputs and running directory before sharing them. (5) If you plan to use the Datasets API or large bulk downloads, confirm storage/bandwidth expectations first. Overall: coherent and consistent with the stated purpose, but install/runtime dependencies should be manually verified before execution.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk970pxgcds6rves0qfchw23415830frh

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Semantic Scholar

Overview

Choose the right Semantic Scholar API workflow before writing code or issuing requests. Prefer small, field-scoped online calls for interactive search, paper/search/bulk for large retrieval jobs, recommendations when the user already has seed papers, and datasets only for offline or release-based data pulls.

Workflow Decision Tree

Start by classifying the task:

  • Use the Graph API for live paper or author lookup, metadata retrieval, query refinement, and batch fetches by known IDs.
  • Use the Recommendations API when the user already has one or more relevant papers and wants similar or related work.
  • Use the Datasets API when the user needs offline snapshots, release-to-release diffs, or corpus-scale ingestion rather than interactive search.

Then choose the endpoint pattern:

  • Use paper/search for normal interactive search, smaller result sets, ranking, and iterative query tuning.
  • Use paper/search/bulk for large result collection; it uses continuation-token pagination and is the default for broad literature harvesting.
  • Use paper/batch or author/batch when the user already has IDs and wants metadata efficiently.
  • Use author/search for author discovery by name or affiliation-like clues.
  • Use recommendations for "papers like this one" workflows.

Operating Rules

  • Request only the fields needed for the task. Semantic Scholar explicitly supports field projection; smaller field lists are faster and less error-prone.
  • Prefer API key authentication via SEMANTIC_SCHOLAR_API_KEY when available, especially for repeated or larger jobs.
  • Handle pagination explicitly. paper/search and author/search are interactive search flows; paper/search/bulk uses continuation tokens.
  • Add retry and backoff for 429 and transient 5xx responses.
  • Preserve raw results before flattening or post-processing them.
  • For broad discovery, write Boolean-rich queries instead of a single brittle phrase. Use exact phrases only when the user asks for them.
  • Do not route normal search tasks to Datasets API. Use datasets only when the user truly needs offline release files or diffs.

Typical Workflows

Search papers interactively

Use this for "find papers about X", "search by title keywords", or "filter by year/citations/open access".

  • Start with paper/search if the user expects inspection and refinement.
  • Keep fields minimal.
  • If the search must collect many records, switch to paper/search/bulk.
  • Read references/query-recipes.md for query patterns.
  • Read references/graph-api.md for endpoint details.

Harvest a broad paper set

Use this for literature review corpora, screening spreadsheets, or downstream ranking.

  • Prefer scripts/semantic_scholar_bulk_search.py.
  • Save raw output to JSONL and only then export CSV if the user needs tabular review.
  • Expose query, year filter, sort, and field selection as parameters instead of hardcoding them.

Fetch by known IDs

Use paper/batch or author/batch when IDs are already known from previous steps or user input.

  • Batch fetch is usually better than repeated single-record lookups.
  • Ask for only the fields required for the analysis or export.

Expand from seed papers

Use recommendations when the user says things like "find papers similar to this", "expand from these seed papers", or "build a related-work set".

  • Use the Recommendations API instead of trying to approximate similarity with a new keyword query.
  • Keep the seed-paper IDs and result set separate from keyword-search results so provenance stays clear.
  • Read references/recommendations-api.md.

Pull datasets or release diffs

Use the Datasets API only for offline ingestion or change tracking between releases.

  • Read references/datasets-api.md.
  • Confirm storage expectations before downloading large files.
  • Document the exact release identifiers used in the workflow.

References

Script

Files

12 total
Select a file
Select a file to preview.

Comments

Loading comments…