Xiaohongshu Crawler

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real Xiaohongshu crawler, but it stores reusable login cookies in plaintext and includes anti-detection/proxy behavior that users should review carefully before installing.

Install only if you understand it may use your logged-in Xiaohongshu session and save reusable cookies locally. Use a separate low-risk account, do not commit or share config.json, avoid third-party proxies with authenticated cookies, keep crawl volumes small, and delete stored cookies/cache when finished.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (17)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The skill is presented as a crawler for public Xiaohongshu content, but the documented behavior goes beyond simple scraping by collecting authenticated session cookies, persisting scraped data and screenshots locally, and describing anti-detection/anti-crawling evasion techniques such as webdriver hiding, simulated human behavior, UA rotation, and proxy rotation. This mismatch is dangerous because it conceals privacy-sensitive credential handling and platform-evasion capabilities that could be used for unauthorized data collection, account misuse, or terms-of-service circumvention.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: This module explicitly implements anti-detection measures for crawling, including randomized delays, human-behavior simulation, user-agent rotation, and proxy rotation. In the context of a public-content crawler, these features materially increase the tool’s ability to evade platform restrictions and anti-abuse controls, which goes beyond ordinary scraping functionality and creates misuse potential.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: Proxy rotation 'to avoid IP bans' is a direct anti-enforcement capability. That makes the crawler more capable of bypassing service protections and sustaining access after blocking, which is especially risky because the stated purpose is merely crawling public Xiaohongshu content rather than sanctioned testing or defensive research.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: This code intentionally implements anti-detection behavior for automated browsing by spoofing browser fingerprinting signals and obtaining proxies before creating the browsing context. In a crawler skill, this materially increases abuse potential by helping automation evade platform controls, rate limits, or anti-bot defenses, which goes beyond normal browser automation.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The script deliberately modifies browser fingerprinting signals such as navigator.webdriver, plugins, and languages to evade bot detection. In a crawler skill, this exceeds basic browsing functionality and materially increases the ability to access content while bypassing platform defenses, which can facilitate unauthorized scraping or account/session misuse when combined with injected cookies.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This script explicitly extracts authenticated Xiaohongshu session cookies from a logged-in browser context and persists them to config.json for later reuse. That exceeds a tool described primarily as a public-content crawler and creates a reusable credential store that could enable account/session hijacking if the file is exposed, committed, or reused beyond the user's intent.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The file provides a standalone workflow whose sole purpose is to capture and store login session cookies, including likely authentication tokens such as web_session, id_token, login_token, and a1. In the context of a crawler advertised for public-content scraping, this capability is security-sensitive and unnecessary on its face, making accidental credential theft, misuse, or overcollection more likely.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The script deliberately alters browser fingerprint signals such as navigator.webdriver, plugins, and languages to evade bot detection. In a crawler that already injects authenticated cookies, this increases the ability to masquerade as a real logged-in user and bypass platform defenses, which goes beyond ordinary automation and raises abuse and account-risk concerns.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The code reads cookie values from configuration and injects them directly into browser sessions, enabling authenticated scraping with whatever session material is supplied. This is risky because it normalizes use of potentially sensitive session tokens without validation, consent checks, scope restrictions, or audit visibility, and can lead to account misuse or leakage if the config is mishandled.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The browser context is configured to use a proxy, including credentials, with no visible warning or control in this code path. Routing authenticated crawler traffic through third-party proxies can expose request metadata, credentials, and scraped content to untrusted infrastructure and also facilitates evasion of source-based restrictions.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The troubleshooting guide explicitly instructs users to re-acquire and enable login cookies to restore search functionality, but it does not warn that these cookies are authentication secrets tied to a personal account and should be handled like credentials. In the context of a crawler for a platform that requires login for search, this normalizes storing and reusing session tokens and increases the chance of account compromise, unauthorized scraping, or accidental disclosure of personal account data.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The guide explicitly instructs users to extract authenticated Xiaohongshu cookies from browser developer tools and store them in config.json, but it does not clearly warn that these cookies are equivalent to session credentials and may grant account access if leaked. In the context of a crawler skill that depends on logged-in scraping, this increases the risk of credential exposure through local files, logs, screenshots, source control, or sharing of configuration artifacts.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The function injects cookies from config directly into the browser context before visiting user-supplied note URLs. This can silently use an authenticated session to access content, and if the URL handling is ever broadened or misused, those cookies may be sent in ways the user did not explicitly authorize, creating privacy and session-security risk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The main crawl flow again applies configured cookies without any user-facing disclosure or confirmation, enabling authenticated scraping by default. In the context of a crawler, this makes the tool more dangerous because it can operate under a user's logged-in identity across search and follow-on page access without explicit runtime awareness.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script writes selected authentication cookies directly into config.json in plaintext, with no warning that these values are sensitive credentials and no controls around file permissions, encryption, or accidental source-control inclusion. Anyone who obtains that file may be able to replay the stored session and access the associated account until the cookies expire or are revoked.

Missing User Warnings

Medium

Confidence: 80% confidence
Finding: The script stores fetched note content and associated user information in a local cache without any explicit notice, retention control, or sensitivity check. In a crawler that handles logged-in content, this increases privacy and compliance risk because scraped personal data may persist on disk longer than intended and could be exposed to other local users, backups, or later misuse.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The script imports raw session cookies from configuration and injects them directly into a browser context, enabling authenticated access without any runtime warning, consent flow, or safeguards around credential handling. In the context of scraping a platform that requires login for search, this creates risk of credential misuse, accidental account exposure, and collection under another user's authenticated session.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal