{"skill":{"slug":"ai-news-collection","displayName":"xinwencaiji","summary":"Run a self-contained Chinese and international AI news workflow inside the current workspace. Use when the user wants either high-frequency RSS capture only...","description":"---\r\nname: ai-news-pipeline\r\ndescription: Run a self-contained Chinese and international AI news workflow inside the current workspace. Use when the user wants either high-frequency RSS capture only or scheduled report delivery only, with cumulative Excel outputs and a merged Word brief, without relying on an external local repository path.\r\n---\r\n\r\n# AI News Pipeline\r\n\r\n## Overview\r\n\r\nThis skill is executable by itself. The actual workflow scripts are bundled in `scripts/`.\r\nRun them against the current workspace or pass `--workspace /path/to/workspace` explicitly.\r\n\r\n## Workspace Requirements\r\n\r\nThe target workspace should contain or accept these files and folders:\r\n\r\n- `config/sources.json`\r\n- `config/international_sources.json`\r\n- `companies.txt`\r\n- `data/`\r\n- `reports/`\r\n- `state/`\r\n\r\nIf the folders do not exist, the scripts create them.\r\n\r\n## Install Dependencies\r\n\r\nInstall Python dependencies before first use:\r\n\r\n```bash\r\npython -m pip install -r /path/to/skill/scripts/requirements.txt\r\n```\r\n\r\n## Available Entrypoints\r\n\r\nUse the bundled Python entrypoints depending on the job type.\r\n\r\n### Capture Only\r\n\r\nUse this for high-frequency collection jobs. It only captures feeds, updates deduplication state, and writes raw and incremental data.\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_capture_only.py --workspace /path/to/workspace\r\n```\r\n\r\n### Report Only\r\n\r\nUse this for scheduled delivery jobs. It reads already-collected data, calls the model for summaries and titles, updates the cumulative Excel files, and rebuilds the Word brief.\r\n\r\nBy default it uses the reporting window from yesterday 00:00 to today 08:00.\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace\r\n```\r\n\r\nOptional time window:\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace --time-window \"2026-03-15 00:00 to 2026-03-16 08:00\"\r\n```\r\n\r\nOptional skip-AI mode:\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace --disable-ai\r\n```\r\n\r\n## Full Workflow\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace\r\n```\r\n\r\nOptional time window:\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace --time-window \"2026-03-15 00:00 to 2026-03-15 18:00\"\r\n```\r\n\r\nOptional skip-AI mode:\r\n\r\n```bash\r\npython /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace --disable-ai\r\n```\r\n\r\n## What Each Entrypoint Does\r\n\r\n`run_capture_only.py`\r\n1. Collect domestic RSS items into `data/YYYY-MM-DD.jsonl`.\r\n2. Collect domestic raw items into `data/domestic_raw_YYYY-MM-DD.jsonl`.\r\n3. Collect international raw items into `data/international_raw_YYYY-MM-DD.jsonl`.\r\n4. Filter international items into `data/international_YYYY-MM-DD.jsonl`.\r\n5. Save per-source snapshots in `snapshots/`.\r\n6. Update RSS deduplication and source metrics in `state/feed_state.json`.\r\n\r\n`run_report_only.py`\r\n1. Read the selected time window from collected data.\r\n2. Build the cumulative domestic Excel output in `reports/company_mentions.xlsx`.\r\n3. Build the cumulative international Excel output in `reports/international_company_mentions.xlsx`.\r\n4. Call the model to generate domestic AI titles and AI summaries.\r\n5. Call the model to generate international AI titles, AI summaries, and impact scores.\r\n6. Build a merged daily Word brief in `reports/`.\r\n\r\n`run_full_workflow.py`\r\n1. Run capture.\r\n2. Run domestic reporting.\r\n3. Run international reporting.\r\n\r\n## Inputs\r\n\r\n- Domestic RSS config: `config/sources.json`\r\n- International RSS config: `config/international_sources.json`\r\n- Company list: `companies.txt`\r\n- Volcengine key: `ARK_API_KEY`\r\n- Optional model override: `ARK_MODEL`\r\n\r\n## Important Behavior\r\n\r\n- `state/feed_state.json` controls RSS deduplication.\r\n- Excel files are cumulative.\r\n- The Word brief is rebuilt per run.\r\n- The Word international section only includes the top 5 items by impact score inside the selected time window.\r\n- International items without a successful AI summary are excluded from the Word brief.\r\n- AI cache files are deleted automatically after each run.\r\n\r\n## Troubleshooting\r\n\r\n1. If the workflow does not rerun old RSS items, check `state/feed_state.json`.\r\n2. If AI columns are empty, check whether `ARK_API_KEY` is set in the execution environment.\r\n3. If the user wants a full rebuild, delete the relevant daily `data` files and `state/feed_state.json`, then rerun.\r\n4. If the user needs exact commands or cloud prompts, read `references/commands.md`.\r\n\r\n## References\r\n\r\n- `references/commands.md`\r\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":474,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1773746083398,"updatedAt":1778491973688},"latestVersion":{"version":"1.0.0","createdAt":1773746083398,"changelog":"Initial release of ai-news-pipeline skill.\n\n- Provides a complete AI news workflow for Chinese and international sources, runnable directly in the current workspace.\n- Adds high-frequency RSS capture (\"capture only\") and scheduled report delivery (\"report only\") modes, both generating cumulative Excel outputs and a merged Word brief.\n- Bundles all necessary scripts under scripts/, with entrypoints for capture, reporting, and full workflow.\n- No external local repository path required; all outputs, state, and configuration are managed inside the workspace.\n- Includes comprehensive instructions for setup, inputs, behavior, and troubleshooting in SKILL.md.","license":"MIT-0"},"metadata":null,"owner":{"handle":"nighmat1220","userId":"s17da4x02zshcbrtw2pskyh9d9885373","displayName":"Nighmat","image":"https://avatars.githubusercontent.com/u/55906764?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780089935259}}