xueqiu-collector

Security checks across malware telemetry and agentic risk

Overview

This scraper has a coherent purpose, but it needs review because it uses a logged-in Edge profile to collect and persist Xueqiu posts, images, and OCR data at broad scope.

Install only if you accept a logged-in browser automation scraper using your Edge session. Prefer a dedicated Edge profile and dedicated Xueqiu account, confirm that the target account and collection scope are authorized, avoid large or repeated scraping that may violate site rules, and keep exported databases, Markdown, JSON, images, OCR text, and logs in a private folder with a deletion plan.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
Findings (10)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
for pkg in pip_fixable:
                print(f"\n  正在安装 {pkg} ...")
                try:
                    r = subprocess.run(
                        [sys.executable, "-m", "pip", "install", pkg],
                        capture_output=True, text=True, timeout=120
                    )
Confidence
92% confidence
Finding
r = subprocess.run( [sys.executable, "-m", "pip", "install", pkg], capture_output=True, text=True, timeout=120 )

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The script automatically discovers and reuses the local Edge profile directory, which can expose authenticated cookies, session state, and other browsing data beyond the stated purpose of collecting public posts. In this skill context, using a real logged-in profile materially increases privacy and account-exposure risk because the tool operates on a user's live browser state.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
The skill executes external local programs (npx/playwright-cli and later tesseract) without clearly disclosing this capability in the skill interface. That expands the trust boundary from simple data collection to arbitrary local tool execution, which is significant in an agent skill because users may not expect local processes to be spawned.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The README explicitly promotes collecting 'any' Xueqiu user's full posts, comments, images, and OCR text into local SQLite/JSON/Markdown archives, but provides only a brief compliance note and no concrete privacy, retention, consent, or data-handling safeguards. In this skill context, that makes large-scale third-party content harvesting easier and increases the risk of privacy violations, unauthorized profiling, and mishandling of personal or sensitive investment-related data.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The trigger phrases include broad terms such as “xueqiu”, “下载雪球”, and generic sync/collect wording that may overlap with ordinary conversation. Overbroad activation increases the chance the skill runs unintentionally, causing unauthorized scraping, browser launch with a real profile, and local data writes without the user explicitly requesting those actions.

Missing User Warnings

High
Confidence
95% confidence
Finding
The skill instructs use of a real logged-in browser profile, downloads full post content and images, performs OCR, and stores/exports the results, but it does not clearly warn about privacy, account-session exposure, retention, or third-party terms/compliance implications. In this context, missing disclosure is risky because the skill handles authenticated access and persistent collection of potentially sensitive user-generated content at scale.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The '--fix' mode performs package installation without any user confirmation, which can unexpectedly modify the host Python environment. In an agent skill context, automatic dependency installation is more dangerous because users may trigger it indirectly and inherit supply-chain or environment-integrity risks.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The script's declared dependency on a real logged-in Edge profile indicates it will access account-scoped browser state, but the skill metadata does not provide an explicit privacy warning commensurate with that access. In a data-collection skill, this makes the behavior more dangerous because it can silently process content and session context tied to the user's personal account.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The script reads LOCALAPPDATA-derived Edge profile paths and then launches a browser against that profile without a clear safety prompt. This is risky because it touches sensitive local account artifacts and can cause authenticated browsing actions under the user's identity.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The code downloads remote images, stores them locally, and later OCR-processes them without an explicit warning about network requests and filesystem writes. In this skill context, that increases privacy and storage risk because collected media may contain sensitive information and persists outside the original browsing session.

VirusTotal

61/61 vendors flagged this skill as clean.

View on VirusTotal