Skillv1.1.0

ClawScan security

知乎数据获取 | Zhihu Data Fetcher · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

ReviewMar 15, 2026, 10:18 PM

Verdict: Review
Confidence: high
Model: gpt-5-mini
Summary: The skill's code largely matches a Zhihu scraper, but there are notable inconsistencies and risky choices (missing declared runtime deps, embedded cookie values in the repo, and browser-console code that captures sensitive page state) that warrant caution before installing or running it.
Guidance: Plain-language checklist and recommendations before installing or running: - Do not run the scripts blindly. This package expects you to run Node.js scripts and Python scripts locally — ensure you have Node and Python installed and run them in a safe environment. - Remove or replace any cookie/session values that are pre-filled in config/fallback-sources.json. Treat the included cookie-like strings as potentially sensitive or stale — do not reuse them. Prefer to leave cookie fields empty and fill them yourself from a browser you control. - If you only need unauthenticated/fallback data, configure the skill to use fallback-only mode (or run only the fallback snippet) so you avoid storing session cookies altogether. - The browser console snippet (browser-research.js) collects document.cookie and other environment details. Only paste/run that code in a browser you trust and where you are comfortable exposing those values locally. Never paste it into a remote console provided by an untrusted party. - Inspect the code locally before use. The code currently performs only direct requests to zhihu.com and configured fallback URLs (e.g., GitHub raw). If you see any network calls to unknown endpoints (especially remote servers not documented in SKILL.md), stop and investigate. - Prefer running in an isolated environment (VM or throwaway container) if you plan to provide cookies from a logged-in account. That limits risk if something unexpected is present. - Ask the publisher or maintainer to: (1) declare required runtimes (Node, Python) and any other prerequisites in metadata, (2) remove any embedded session tokens from the repo, and (3) confirm whether any of the provided cookie strings are placeholders. If you want, I can extract the precise lines that read or print cookies and network URLs (so you can audit them), or produce a safer minimal command sequence that uses only the fallback source.

Review Dimensions

Purpose & Capability: concernThe skill's stated purpose (fetch Zhihu hot lists with three-level auth) aligns with the scripts provided (Node.js snippets for fetching and Python for DB operations). However the package metadata declares no required binaries or env vars, yet runtime instructions and code clearly require Node.js and Python 3. That mismatch (failing to declare required runtimes) is an incoherence that could mislead users. Otherwise the requested files and configuration (cookies, fallback sources) are consistent with the scraping purpose.
Instruction Scope: noteSKILL.md instructs users how to login, copy cookies into config, run Node/Python scripts, and optionally run a browser console snippet. The browser-research.js intentionally collects document.cookie plus userAgent, timezone, platform and some window keys — data that is sensitive (session cookies). Collecting these in the browser console is relevant to anti-crawl research, but it also gathers session tokens and environment details which are sensitive and could be exfiltrated if modified. The instructions do not direct data to unknown external endpoints beyond Zhihu and configured fallback URLs (e.g., GitHub raw).
Install Mechanism: okThere is no external installer or remote download; the skill is instruction+source-file based. That limits supply-chain risk from arbitrary downloads. The scripts will execute locally (Node and Python) and create files under the repo's data/ directory. No third-party install URLs or archives are used.
Credentials: concernThe skill expects users to supply Zhihu session cookies (file-based cookie fallback) which is proportionate to the stated goal of authenticated scraping. However the repository's config/fallback-sources.json already contains pre-filled cookie-like values (e.g., zhihu_session, _xsrf, d_c0). Embedding session tokens or real cookie values in a distributed repo is inappropriate and risky: those values could be reused by others or indicate the author accidentally committed secrets. The skill does not request unrelated credentials or environment variables, which is good, but the included cookie data is a red flag.
Persistence & Privilege: okThe skill does not request global 'always' installation and defaults allow model invocation (normal). It writes to its own data/zhihu.db and generated HTML files, which is expected for this functionality. It does not attempt to modify other skills or system-wide agent config.