Install
openclaw skills install biomed-dataset-finderSearch NCBI GEO/SRA, NGDC-GSA, and CNGB for biomedical datasets by disease, treatment, species, pathology subtype, and data type. Returns bold dataset ID, links, and article info in a structured table.
openclaw skills install biomed-dataset-finderSearch public biomedical datasets from NCBI, NGDC, and CNGB by conversational query keywords.
User asks for datasets related to a disease/treatment/species/subtype/data type combination. Examples:
| Priority | Source | Database | Accession Prefix |
|---|---|---|---|
| 1st | NCBI | GEO Datasets (gds) | GSE |
| 1st | NCBI | SRA (single-cell queries) | SRP/SRR |
| 1st | NGDC | Genome Sequence Archive | CRA |
| 2nd | CNGB | CNGBdb | CNP (requires token for some data) |
Extract from user message:
If any critical field is missing, ask the user to clarify.
Use NCBI E-utilities (free, no auth).
gds database (GEO Datasets, NOT gse) with combined keywordsaccession (GSE prefix), title, summary, and pubmedids (list)sra databaseQuery: ({disease}) AND ({treatment}) AND ({species}) AND ({data_type})
Rate limit: ~3 requests/second.
API: https://ngdc.cncb.ac.cn/search/api/specific?q={keywords}&db=gsa&size=20
Requires User-Agent header. Filter response for type=="GSA" entries (CRA accessions).
If CNGB token provided: search CNGBdb API. On auth error: ask user if they want to provide token or skip.
Markdown table with bold dataset ID, article info (authors, title, journal, year, DOI), and direct links.
If no results: "No public datasets found matching your criteria. Try adjusting keywords or switching data sources."
This skill handles scientific research data. Fabricating a single dataset entry undermines the user's work.
https://.../acc.cgi?acc={GSE}). Never guess URLs.- in the table — never fill with plausible textA researcher using wrong dataset IDs or fake article info could: waste weeks on non-existent data, cite non-existent papers, or compromise the validity of their research. The cost of hallucination here is far higher than in general conversation.
python3 skills/biomed-dataset-finder/scripts/search_datasets.py \
--disease "colon cancer" --treatment "immunotherapy" \
--species human --subtype dMMR --type scRNA-seq --max-results 10
See references/ncbi_api.md for NCBI E-utilities details.
See references/ngdc_api.md for NGDC GSA API details.
See references/cngb_api.md for CNGBdb API details.