xueqiu-collector

Security checks across malware telemetry and agentic risk

Overview

This scraper has a coherent purpose, but it needs review because it uses a logged-in Edge profile to collect and persist Xueqiu posts, images, and OCR data at broad scope.

Install only if you accept a logged-in browser automation scraper using your Edge session. Prefer a dedicated Edge profile and dedicated Xueqiu account, confirm that the target account and collection scope are authorized, avoid large or repeated scraping that may violate site rules, and keep exported databases, Markdown, JSON, images, OCR text, and logs in a private folder with a deletion plan.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import

Findings (10)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: for pkg in pip_fixable: print(f"\n 正在安装 {pkg} ...") try: r = subprocess.run( [sys.executable, "-m", "pip", "install", pkg], capture_output=True, text=True, timeout=120 )
Confidence: 92% confidence
Finding: r = subprocess.run( [sys.executable, "-m", "pip", "install", pkg], capture_output=True, text=True, timeout=120 )

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The script automatically discovers and reuses the local Edge profile directory, which can expose authenticated cookies, session state, and other browsing data beyond the stated purpose of collecting public posts. In this skill context, using a real logged-in profile materially increases privacy and account-exposure risk because the tool operates on a user's live browser state.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill executes external local programs (npx/playwright-cli and later tesseract) without clearly disclosing this capability in the skill interface. That expands the trust boundary from simple data collection to arbitrary local tool execution, which is significant in an agent skill because users may not expect local processes to be spawned.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README explicitly promotes collecting 'any' Xueqiu user's full posts, comments, images, and OCR text into local SQLite/JSON/Markdown archives, but provides only a brief compliance note and no concrete privacy, retention, consent, or data-handling safeguards. In this skill context, that makes large-scale third-party content harvesting easier and increases the risk of privacy violations, unauthorized profiling, and mishandling of personal or sensitive investment-related data.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The trigger phrases include broad terms such as “xueqiu”, “下载雪球”, and generic sync/collect wording that may overlap with ordinary conversation. Overbroad activation increases the chance the skill runs unintentionally, causing unauthorized scraping, browser launch with a real profile, and local data writes without the user explicitly requesting those actions.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The skill instructs use of a real logged-in browser profile, downloads full post content and images, performs OCR, and stores/exports the results, but it does not clearly warn about privacy, account-session exposure, retention, or third-party terms/compliance implications. In this context, missing disclosure is risky because the skill handles authenticated access and persistent collection of potentially sensitive user-generated content at scale.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The '--fix' mode performs package installation without any user confirmation, which can unexpectedly modify the host Python environment. In an agent skill context, automatic dependency installation is more dangerous because users may trigger it indirectly and inherit supply-chain or environment-integrity risks.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The script's declared dependency on a real logged-in Edge profile indicates it will access account-scoped browser state, but the skill metadata does not provide an explicit privacy warning commensurate with that access. In a data-collection skill, this makes the behavior more dangerous because it can silently process content and session context tied to the user's personal account.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script reads LOCALAPPDATA-derived Edge profile paths and then launches a browser against that profile without a clear safety prompt. This is risky because it touches sensitive local account artifacts and can cause authenticated browsing actions under the user's identity.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code downloads remote images, stores them locally, and later OCR-processes them without an explicit warning about network requests and filesystem writes. In this skill context, that increases privacy and storage risk because collected media may contain sensitive information and persists outside the original browsing session.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal