Security audit

Real Estate Crawler

Security checks across malware telemetry and agentic risk

Overview

This is an openly documented real-estate crawler, but it includes extensive anti-bot, CAPTCHA, proxy, cookie, and session-reuse workflows that need careful review before use.

Install only if you have explicit authorization to collect data from the target sites and understand the legal, platform-policy, privacy, and account/session risks. Avoid CAPTCHA-solving services, proxy rotation for evasion, and reuse of verified browser sessions; prefer official APIs or licensed datasets, and delete any saved cookies, sessions, screenshots, or page dumps promptly.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (32)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: import subprocess try: result = subprocess.run(cmd, shell=True, capture_output=True, text=True) if result.returncode == 0: print(f"命令执行成功")
Confidence: 99% confidence
Finding: result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill exposes powerful capabilities through allowed tools and documented usage patterns that enable shell execution, browser-driven network access, and likely file output, but it does not declare corresponding permissions in a structured way. In this context, the skill is specifically designed to bypass anti-bot protections, manage cookies/sessions, and use proxies, so the undeclared capabilities materially increase the risk of unauthorized scraping, evasive automation, and unreviewed data exfiltration or local file writes.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The configuration explicitly embeds anti-crawler evasion tactics such as simulating human behavior, using proxy IPs, restoring sessions, setting cookies, and handling CAPTCHA workflows. In a crawler skill, this goes beyond neutral scraping configuration and materially enables bypass of platform access controls and anti-abuse defenses, increasing the risk of unauthorized collection or terms-of-service violations.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This section explicitly documents anti-bot evasion techniques, including CAPTCHA bypass workflows, simulated device fingerprinting, cookie/session manipulation, and reuse of validated sessions to continue access after anti-crawler checks. In the context of a real-estate crawler skill, these instructions go beyond normal scraping guidance and materially enable circumvention of target-site access controls.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The guide recommends proxy IP rotation specifically to sustain crawling despite anti-crawler defenses and at larger scale. This facilitates evasion of IP-based blocking and undermines target-site protective controls, increasing the likelihood of abusive scraping and making attribution and containment harder.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Saving and reloading a verified browser session after CAPTCHA completion enables replay of trust state granted to a prior interactive validation. This can be used to bypass repeated anti-bot checks and, if session artifacts are exposed, may also create account/session security risks.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This documentation explicitly instructs users how to bypass CAPTCHA and anti-bot controls using mobile user agents, device fingerprinting, and human-behavior simulation. In the context of a real-estate crawler skill, these are not neutral operational notes; they directly enable evasion of access controls designed to restrict automated collection.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The file recommends proxy rotation and traffic shaping specifically to avoid anti-bot enforcement, which facilitates persistent unauthorized scraping and makes detection harder. These tactics materially increase the operational capability of an agent to continue accessing protected resources after normal controls would intervene.

Context-Inappropriate Capability

Critical

Confidence: 99% confidence
Finding: Integrating a third-party CAPTCHA-solving service automates defeat of a site’s human-verification control and significantly lowers the barrier to large-scale abuse. It also introduces external data transfer of CAPTCHA images and credentials, compounding both access-control bypass and privacy/security risks.

Context-Inappropriate Capability

High

Confidence: 92% confidence
Finding: The skill exposes a broad script-launching capability via dynamically assembled shell and Python invocations rather than a tightly scoped crawler implementation. In this file, that broader execution surface is especially dangerous because the dispatcher combines external script execution with shell=True, making abuse of the orchestration layer a realistic path to arbitrary code execution.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The script explicitly saves reusable browser session state to disk, which may include cookies or anti-bot/session tokens that can be replayed later. In a skill whose stated purpose is anti-crawler bypass, persisting this state increases the chance of unauthorized reuse, account/session hijacking, or repeated circumvention beyond a one-time browsing session.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The script advises proxy IP rotation for large-scale scraping, which materially facilitates evasion of rate limits and anti-abuse controls. In the context of a real-estate crawler advertised with anti-bot bypass strategies, this goes beyond benign extraction and supports scaled circumvention of website protections.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The script explicitly configures a spoofed mobile device fingerprint and custom headers to appear more like a real user, which goes beyond ordinary automation and is clearly aimed at reducing bot detection. In the context of a real-estate crawler skill that advertises anti-crawler bypass strategies, this meaningfully increases the likelihood of unauthorized scraping and terms-of-service evasion.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The scripted scrolling and timed waits are presented as 'simulated human browsing behavior,' indicating deliberate attempts to evade anti-bot heuristics rather than necessary functional automation. Because the skill is specifically a multi-site crawler with anti-crawling bypass claims, this behavior is more dangerous than in a normal browser testing script.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: This script is explicitly designed to circumvent Lianjia anti-bot controls and CAPTCHA protections through cookie seeding, proxy use, browser fingerprint spoofing, simulated browsing behavior, and session reuse. In the context of a real-estate crawler skill, these are not incidental automation features but operational evasion instructions that enable unauthorized scraping and bypass of access controls.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The README provides operational instructions for bypassing CAPTCHA, spoofing browser fingerprints, using saved sessions, and rotating proxies directly alongside runnable commands, which materially facilitates evasion of access controls on third-party sites. Although there is a generic warning elsewhere, it is not colocated with the usage steps and does not adequately communicate legal, privacy, and service-impact risks, making the skill more dangerous in context because its stated purpose is to scrape commercial real-estate platforms while defeating anti-bot protections.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The read_when triggers are broad and capability-driven rather than tightly scoped, so the skill may activate whenever a user asks about scraping real-estate data or bypassing anti-bot controls. In this context, unintended activation is more dangerous because the skill includes crawler behavior and anti-crawling evasion, which can cause the agent to perform sensitive or policy-violating collection actions without sufficiently explicit user intent.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The file promotes use of real cookies, session persistence, proxy IPs, mobile-device simulation, and CAPTCHA workarounds without any safety notice, consent requirement, or compliance boundary. That omission makes misuse easier because operators are guided toward evasion techniques without being warned about legal, ethical, or platform-policy constraints, especially in a skill explicitly designed for scraping commercial real-estate sites.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The document instructs operators to modify headers, cookies, geolocation, proxies, and browser state without adequate warning about privacy, session theft, account misuse, or the integrity implications of impersonation and evasion. In a skill whose purpose is web crawling, this omission makes misuse more likely and normalizes risky behavior that can compromise third-party systems and user data.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The example sends CAPTCHA images and an API key to an external service without warning about third-party data exposure, credential handling, retention, or legal implications. Even aside from the bypass behavior, this creates avoidable security and privacy risks for users who may transmit sensitive session-linked content to an unvetted provider.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The publication document describes large-scale scraping, anti-bot evasion, CAPTCHA bypass, session handling, proxy use, and data export, but does not prominently warn users about privacy, personal-data handling, retention, and downstream misuse risks. In a real-estate scraping context, exported data may include addresses, agent contact details, occupancy clues, or other sensitive information, so omission of explicit privacy guidance increases the likelihood of improper collection and disclosure.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: Saving browser session state without clearly warning that it may contain sensitive cookies, identifiers, or anti-bot tokens creates a real risk of credential/session leakage if the file is copied, shared, or left on disk. Because this skill is specifically designed to bypass anti-scraping protections, the saved state is more likely to be security-relevant than ordinary browser cache data.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The script captures screenshots and exports full page text to local files without any consent prompt, retention guidance, or data-minimization controls. On real-estate platforms, page contents can include personal or proprietary listing data, so silent collection and storage creates privacy, compliance, and misuse risks.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The shell script automates a browser while injecting cookies, spoofing request headers, taking screenshots, and persisting session state, all without safeguards around authorization, sensitive data capture, or downstream use of harvested session artifacts. These behaviors can expose authenticated session material and facilitate repeated evasion against the target website.

Natural-Language Policy Violations

High

Confidence: 99% confidence
Finding: The file openly labels itself as a CAPTCHA bypass script and instructs the operator to simulate normal-user behavior to get past defenses. That wording, combined with the concrete evasion workflow, strongly indicates deliberate anti-detection intent rather than ordinary browser automation.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal