{"skill":{"slug":"agentic-paper-digest-skill","displayName":"Agentic Paper Digest Skill","summary":"Fetches and summarizes recent arXiv and Hugging Face papers with Agentic Paper Digest. Use when the user wants a paper digest, a JSON feed of recent papers, or to run the arXiv/HF pipeline.","description":"---\nname: agentic-paper-digest-skill\ndescription: Fetches and summarizes recent arXiv and Hugging Face papers with Agentic Paper Digest. Use when the user wants a paper digest, a JSON feed of recent papers, or to run the arXiv/HF pipeline.\nhomepage: https://github.com/matanle51/agentic_paper_digest\ncompatibility: Requires Python 3, network access, and either git or curl/wget for bootstrap. LLM access via OPENAI_API_KEY or LITELLM_API_KEY (OpenAI-compatible).\nmetadata: {\"clawdbot\":{\"requires\":{\"anyBins\":[\"python3\",\"python\"]}}}\n---\n\n# Agentic Paper Digest\n\n## When to use\n- Fetch a recent paper digest from arXiv and Hugging Face.\n- Produce JSON output for downstream agents.\n- Run a local API server when a polling workflow is needed.\n\n## Prereqs\n- Python 3 and network access.\n- LLM access via `OPENAI_API_KEY` or an OpenAI-compatible provider via `LITELLM_API_BASE` + `LITELLM_API_KEY`.\n- `git` is optional for bootstrap; otherwise `curl`/`wget` (or Python) is used to download the repo.\n\n## Get the code and install\n- Preferred: run the bootstrap helper script. It uses git when available or falls back to a zip download.\n\n```bash\nbash \"{baseDir}/scripts/bootstrap.sh\"\n```\n\n- Override the clone location by setting `PROJECT_DIR`.\n\n```bash\nPROJECT_DIR=\"$HOME/agentic_paper_digest\" bash \"{baseDir}/scripts/bootstrap.sh\"\n```\n\n## Run (CLI preferred)\n\n```bash\nbash \"{baseDir}/scripts/run_cli.sh\"\n```\n\n- Pass through CLI flags as needed.\n\n```bash\nbash \"{baseDir}/scripts/run_cli.sh\" --window-hours 24 --sources arxiv,hf\n```\n\n## Run (API optional)\n\n```bash\nbash \"{baseDir}/scripts/run_api.sh\"\n```\n\n- Trigger runs and read results.\n\n```bash\ncurl -X POST http://127.0.0.1:8000/api/run\ncurl http://127.0.0.1:8000/api/status\ncurl http://127.0.0.1:8000/api/papers\n```\n\n- Stop the API server if needed.\n\n```bash\nbash \"{baseDir}/scripts/stop_api.sh\"\n```\n\n## Outputs\n- CLI `--json` prints `run_id`, `seen`, `kept`, `window_start`, and `window_end`.\n- Data store: `data/papers.sqlite3` (under `PROJECT_DIR`).\n- API: `POST /api/run`, `GET /api/status`, `GET /api/papers`, `GET/POST /api/topics`, `GET/POST /api/settings`.\n\n## Configuration\nConfig files live in `PROJECT_DIR/config`. Environment variables can be set in the shell or via a `.env` file. The wrappers here auto-load `.env` from `PROJECT_DIR` (override with `ENV_FILE=/path/to/.env`).\n\n**Environment (.env or exported vars)**\n- `OPENAI_API_KEY`: required for OpenAI models (litellm reads this).\n- `LITELLM_API_BASE`, `LITELLM_API_KEY`: use an OpenAI-compatible proxy/provider.\n- `LITELLM_MODEL_RELEVANCE`, `LITELLM_MODEL_SUMMARY`: models for relevance and summarization (summary defaults to relevance model if unset).\n- `LITELLM_TEMPERATURE_RELEVANCE`, `LITELLM_TEMPERATURE_SUMMARY`: lower for more deterministic output.\n- `LITELLM_MAX_RETRIES`: retry count for LLM calls.\n- `LITELLM_DROP_PARAMS=1`: drop unsupported params to avoid provider errors.\n- `WINDOW_HOURS`, `APP_TZ`: recency window and timezone.\n- `ARXIV_CATEGORIES`: comma-separated categories (default includes `cs.CL,cs.AI,cs.LG,stat.ML,cs.CR`).\n- `ARXIV_API_BASE`, `HF_API_BASE`: override source endpoints if needed.\n- `ARXIV_MAX_RESULTS`, `ARXIV_PAGE_SIZE`: arXiv paging limits.\n- `MAX_CANDIDATES_PER_SOURCE`: cap candidates per source before LLM filtering.\n- `FETCH_TIMEOUT_S`, `REQUEST_TIMEOUT_S`: source fetch and per-request timeouts.\n- `ENABLE_PDF_TEXT=1`: include first-page PDF text in summaries; requires `PyMuPDF` (`pip install pymupdf`).\n- `DATA_DIR`: location for `papers.sqlite3`.\n- `CORS_ORIGINS`: comma-separated origins allowed by the API server (UI use).\n- Path overrides: `TOPICS_PATH`, `SETTINGS_PATH`, `AFFILIATION_BOOSTS_PATH`.\n\n**Config files**\n- `config/topics.json`: list of topics with `id`, `label`, `description`, `max_per_topic`, and `keywords`. The relevance classifier must output topic IDs exactly as defined here. `max_per_topic` also caps results in `GET /api/papers` when `apply_topic_caps=1`.\n- `config/settings.json`: overrides fetch limits (`arxiv_max_results`, `arxiv_page_size`, `fetch_timeout_s`, `max_candidates_per_source`). Updated via `POST /api/settings`.\n- `config/affiliations.json`: list of `{pattern, weight}` boosts applied by substring match over affiliations. Weights add up and are capped at 1.0. Invalid JSON disables boosts, so keep the file strict JSON (no trailing commas).\n\n## Mandatory workflow (follow step-by-step)\n1. **You first MUST open and read the configuration from the github repo: https://github.com/matanle51/agentic_paper_digest you downloaded**:\n   - Load `config/topics.json`, `config/settings.json`, and `config/affiliations.json` (if present).\n   - Note current topic IDs, caps, and fetch limits before asking the user to change them.\n2. **ASK THE USER TO PROVIDE IT'S PREFERENCES ABOUT THE FOLLOWING (HELP THE USER)**:\n   - **Topics of interest** → update `config/topics.json` (`topics[].id/label/description/keywords`, `max_per_topic`).  \n     Show current defaults and ask whether to keep or change them.\n   - **Time window (hours)** → set `WINDOW_HOURS` (or pass `--window-hours` to CLI) **only if the user cares**; otherwise keep default to 24h.\n   - ASK THE USER TO FILL THE FOLLOWING PARAMETERS (explain the user why are their intent): `ARXIV_CATEGORIES`, `ARXIV_MAX_RESULTS`, `ARXIV_PAGE_SIZE`, `MAX_CANDIDATES_PER_SOURCE`.  \n     Ask whether to keep defaults and show the current values.\n   - **Model/provider** → set `OPENAI_API_KEY` *or* `LITELLM_API_KEY` (+ `LITELLM_API_BASE` if proxy), and set `LITELLM_MODEL_RELEVANCE`/`LITELLM_MODEL_SUMMARY`.\n   - **Do NOT ask by default**: timezone, quality vs cost, timeouts, PDF text, affiliation biasing, sources list. Use defaults unless the user requests changes.\n3. **Confirm workspace path**: Ask where to clone/run. Default to `PROJECT_DIR=\"$HOME/agentic_paper_digest\"` if the user doesn’t care. Never hardcode `/Users/...` paths.\n4. **Bootstrap the repo**: Run the bootstrap script (unless the repo already exists and the user says to skip).\n5. **Create or verify `.env`**:\n   - If `.env` is missing, create it from `.env.example` (in the repo), then ask the user to fill keys and any requested preferences.\n   - Ensure at least one of `OPENAI_API_KEY` or `LITELLM_API_KEY` is set before running.\n6. **Apply config changes**:\n   - Edit JSON files directly (or use `POST /api/topics` and `POST /api/settings` if running the API).\n7. **Run the pipeline**:\n   - Prefer `scripts/run_cli.sh` for one-off JSON output.\n   - Use `scripts/run_api.sh` only if the user explicitly asks for UI/API access or polling.\n8. **Report results**:\n   - If results are sparse, suggest increasing `WINDOW_HOURS`, `ARXIV_MAX_RESULTS`, or broadening topics.\n\n## Getting good results\n- Help the user define and keep topics focused and mutually exclusive so the classifier can choose the right IDs.\n- Use a stronger model for summaries than for relevance if quality matters.\n- If using openAI's model, defualy to gpt-5-mini for good tradeoff.\n- Increase `WINDOW_HOURS` or `ARXIV_MAX_RESULTS` when results are sparse, or lower them if results are too noisy.\n- Tune `ARXIV_CATEGORIES` to your research domains.\n- Enable PDF text (`ENABLE_PDF_TEXT=1`) when abstracts are too thin.\n- Use modest affiliation weights to bias ranking without swamping relevance.\n- BE PROACTIVE AND HELP THE USER TUNE THE SKILL FOR GOOD RESULTS!\n\n## Troubleshooting\n- Port 8000 busy: run `bash \"{baseDir}/scripts/stop_api.sh\"` or pass `--port` to the API command.\n- Empty results: increase `WINDOW_HOURS` or verify the API key in `.env`.\n- Missing API key errors: export `OPENAI_API_KEY` or `LITELLM_API_KEY` in the shell before running.\n","topics":["Agentic","Arxiv","Json","Pipeline"],"tags":{"latest":"0.3.3","stable":"0.3.1"},"stats":{"comments":0,"downloads":3949,"installsAllTime":149,"installsCurrent":16,"stars":5,"versions":7},"createdAt":1770280745022,"updatedAt":1778486025837},"latestVersion":{"version":"0.3.3","createdAt":1770490710503,"changelog":"- Updated workflow instructions to prioritize proactive user interaction and guidance.\n- Clarified that the user must first open and read the configuration from the downloaded GitHub repo.\n- Emphasized assisting the user in providing preferences for topics, time window, and core parameters.\n- Added a note to default to 24 hours for the time window unless the user specifies otherwise.\n- Added recommendation to default OpenAI model to \"gpt-5-mini\" for optimal tradeoff.\n- Added reminders for skill developers to help the user tune for good results.\n- Minor edits for clarity, directness, and improved step-by-step guidance.","license":null},"metadata":{"setup":[],"os":null,"systems":null},"owner":{"handle":"matanle51","userId":"s178t402ac9w5ceeebrx2dgwhh884g04","displayName":"matanle51","image":"https://avatars.githubusercontent.com/u/34571073?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1779949711902}}