知乎抓取.SKILL

Security checks across malware telemetry and agentic risk

Overview

The skill’s Zhihu scraping purpose is clear, but it uses stealth authenticated automation, stores session cookies, and can move/delete local export files without clear opt-in.

Install only if you are comfortable with authenticated Zhihu automation and possible account-policy risk. Use a dedicated workspace and preferably a low-privilege Zhihu account, do not share or commit `zhihu_cookies.json` or the browser profile, back up your Obsidian vault first, and treat `write_to_obsidian.py` as a move/import tool rather than a non-destructive copy.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (19)

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The script injects a stealth script that hides browser automation signals such as navigator.webdriver and fabricates window.chrome properties. This is anti-detection behavior that exceeds normal scraping needs and is designed to evade platform defenses, increasing the risk of unauthorized access or policy circumvention.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The cookie keepalive routine simulates human browsing by visiting random pages or prior articles, scrolling, and waiting with randomized timing to preserve session validity. This goes beyond straightforward retrieval and functions as behavior spoofing to prolong authenticated access and avoid detection mechanisms.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The Playwright fallback injects anti-detection fingerprint spoofing to hide browser automation and evade Zhihu bot checks. That behavior is risky because it is designed to bypass platform defenses, increases the chance of policy violations or account action, and normalizes stealth techniques that are unnecessary for a benign fetcher unless explicitly justified and consented to.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The script deliberately alters browser characteristics to evade automation detection, including disabling Blink automation features and redefining navigator.webdriver. In the context of a scraping skill with persistent login state, this bypass-oriented behavior increases legal, policy, and account-security risk because it is designed to defeat site defenses rather than perform transparent automation.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The script injects explicit anti-detection logic to disguise browser automation, including masking navigator.webdriver, spoofing plugins/languages, deleting known automation markers, and fabricating a chrome object. In the context of a content-fetching skill, this exceeds normal scraping behavior and creates a capability for bypassing platform bot-detection and policy enforcement, which increases abuse potential and legal/compliance risk.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The module description presents the tool as ordinary article fetching, but the implementation includes deliberate stealth evasion techniques. This mismatch is dangerous because it conceals the true security and compliance posture of the skill from reviewers and users, making risky behavior easier to approve, deploy, or misuse without informed consent.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The function is described as synchronizing images, but it uses shutil.move(), which removes the originals from the source directory. This can cause unintended data loss and break workflows that expect the fetched Zhihu dataset to remain intact after export to Obsidian.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: After writing each article into the Obsidian vault, the script silently deletes the original markdown file. This is dangerous because the tool presents itself as an exporter/writer, not a destructive migration utility, so users may lose their only local copy of scraped content without realizing it.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The README explicitly promotes cookie persistence, keep-alive behavior, and manual relogin flows, but does not warn users that authenticated browser state may contain session tokens and other sensitive data. In a scraping skill, persisted cookies can enable account takeover or unauthorized access if stored insecurely, shared across workspaces, or reused by other tools.

Missing User Warnings

Medium

Confidence: 81% confidence
Finding: The README instructs users to write scraped content directly into an Obsidian Vault without clearly warning that this will create, overwrite, and reorganize local files. Because an Obsidian Vault is user content, an agent or user following these instructions could unintentionally modify personal notes, sync sensitive scraped data into other devices/cloud backends, or pollute an existing knowledge base.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The trigger conditions are broad enough to match generic conversations about Zhihu, cookies, captchas, scraping, or knowledge-base sync, which increases the chance of accidental invocation. In a skill with shell, web, and file-write capabilities plus login/cookie workflows, overbroad auto-activation raises the risk of collecting data or modifying local files when the user did not intend to run this specific workflow.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill documents persistent storage of Zhihu cookies, browser user data, scraped content, and optional Obsidian writes, but it does not prominently warn about privacy and account-risk implications. Persisted cookies and browser profiles are sensitive authentication material; if stored insecurely or reused unintentionally, they could expose a user's account or private browsing-derived data.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script persists authentication cookies to a predictable local file in the workspace without access controls, encryption, or prominent warning to the user. If that file is read by another local process, synced to cloud storage, or committed accidentally, an attacker may reuse the session for account access.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The browser cookie export routine saves all cookies from the authenticated context, not just the ones necessary for Zhihu access, and writes them to disk in plaintext. Broad cookie capture materially increases the chance of credential/session leakage and can expose unrelated accounts or services if they are present in the browser context.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The script loads persisted Zhihu authentication cookies and sends them in requests without any explicit notice, confirmation, or scope limitation. This is dangerous because it silently uses authenticated session material, which can expose private account access patterns and scrape data under the user's identity without clear consent or auditability.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The script automatically loads persisted Zhihu cookies from disk and injects them into a browser context, enabling authenticated scraping without an explicit runtime consent prompt or clear warning about account-scoped access. In a shared workspace or agent environment, this can cause unintended use of another user's session and expose private account data or trigger account actions under that identity.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script performs irreversible deletion of source article files without prior warning, confirmation, backup, or dry-run mode. In a batch-processing context, this can lead to large-scale accidental data loss if the classification, destination selection, or vault detection is incorrect.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The script persists Zhihu authentication cookies, including the login session token (`z_c0`), to a predictable plaintext file under the user's home directory. If that file is read by another local user, malware, a backup/sync service, or another tool in the workspace, an attacker may be able to reuse the session and access the victim's Zhihu account without credentials.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The script serializes Zhihu authentication cookies, including session-bearing values such as z_c0, to a plaintext JSON file in the workspace. In the context of a scraping/login helper, this is likely intentional for persistence rather than malicious, but it still creates a real credential exposure risk if the workspace is readable by other local users, synced to cloud storage, committed to source control, or exfiltrated by other processes.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal