Real Estate Spider

Security checks across malware telemetry and agentic risk

Overview

This skill is a real-estate scraper that explicitly teaches anti-bot and CAPTCHA bypass methods and saves reusable browser sessions, so it needs user review before installation.

Install only for targets where you have clear permission to collect data. Avoid using the CAPTCHA-bypass, proxy-rotation, real-cookie, and saved-session workflows against third-party sites; prefer official APIs or licensed datasets. Treat generated session files, screenshots, PDFs, and exports as sensitive because they may contain cookies, account state, or protected site content.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (17)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: print(f"执行命令: {command}") import subprocess result = subprocess.run(command, shell=True, capture_output=True, text=True) if result.returncode == 0: print("agent-browser脚本执行成功")
Confidence: 94% confidence
Finding: result = subprocess.run(command, shell=True, capture_output=True, text=True)

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The optional CAPTCHA-solving integration sends challenge images to an external service, adding a new third-party automation channel beyond simple real-estate data collection. This is risky because CAPTCHA images may contain session-linked or site-specific content, and outsourcing challenge solving also facilitates bypass of access controls on target websites.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: This section explicitly documents use of a third-party CAPTCHA-solving service to bypass anti-bot controls, which materially exceeds ordinary data extraction and enables automated circumvention of access restrictions. In the context of a real-estate scraping skill that already discusses session reuse, cookies, and evasion tactics, this is a strong indicator of deliberate anti-abuse bypass capability.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The proxy rotation guidance is presented specifically to evade IP-based restrictions, which is an anti-detection tactic rather than a normal scraping necessity. Combined with rate-shaping and CAPTCHA workarounds elsewhere in the file, it increases the skill's ability to persist against site defenses and scale unauthorized collection.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: This script is explicitly designed to bypass Lianjia anti-bot and captcha controls using cookie seeding, browser fingerprint/header shaping, behavior simulation, proxy use, and session reuse. These are not incidental scraping helpers; they are operational evasion techniques that enable unauthorized automated access and can facilitate abuse of a protected service.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: This script explicitly configures a spoofed browser fingerprint, custom headers, cookies, human-like scrolling, and city-specific navigation to avoid anti-bot controls on real-estate sites. In the context of a scraping skill whose stated purpose includes anti-crawler bypass, these are not incidental automation steps but operational evasion features that can facilitate unauthorized or terms-violating data collection.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The later portion of the script continues the bypass workflow by checking for CAPTCHA and anti-spider markers, persisting session state, and explicitly recommending proxy IP rotation for larger-scale collection. Those features make the skill more dangerous because they help operators adapt to and continue scraping despite defensive measures, increasing the likelihood of abuse against protected platforms.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The script explicitly configures a mobile device fingerprint, custom headers, and later checks for CAPTCHA presence while advising manual handling, which goes beyond ordinary browsing and into anti-detection workflow. In the context of a real-estate scraping skill targeting sites known to deploy anti-bot controls, this materially enables evasion of website defenses and can facilitate unauthorized collection at scale.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The comments and commands present the fingerprint changes and scrolling as 'more realistic' and 'simulating human browsing,' but their practical effect is to mask automation and reduce detection. This is dangerous because it normalizes stealth behavior in an automation script and encourages operators to bypass anti-abuse mechanisms of the target service.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The script explicitly sets a spoofed desktop browser fingerprint, adds realistic headers including a Google referer, and simulates scrolling and interaction patterns before checking for anti-bot prompts. In the context of a real-estate scraping skill that advertises anti-crawling strategies, this is not ordinary browsing automation but functionality intended to evade bot detection and access controls, which can facilitate unauthorized scraping and ToS circumvention.

Missing User Warnings

Medium

Confidence: 76% confidence
Finding: The skill provides scraping, browser automation, anti-detection tactics, and data export workflows without an up-front warning about privacy, legal, and data-handling consequences. Because the targets are real-estate platforms that may contain personal or sensitive listing-related information, the absence of explicit user warnings and consent boundaries makes accidental misuse more likely.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The skill's activation conditions are broad and capability-driven, such as triggering whenever data scraping, anti-bot bypass, or real-estate extraction is needed. This can cause the agent to invoke a high-risk web-scraping skill in situations the user did not explicitly request, increasing the chance of unauthorized collection, ToS violations, or unsafe browsing behavior.

Natural-Language Policy Violations

Medium

Confidence: 94% confidence
Finding: The configuration explicitly advises anti-crawler evasion techniques, including simulating human behavior, random delays, and setting cookies to bypass site defenses. Even though this is descriptive config text rather than executable logic, it operationalizes circumvention of access controls and can enable unauthorized scraping against platform protections and terms of service.

Natural-Language Policy Violations

Medium

Confidence: 95% confidence
Finding: This text recommends proxy IPs, session management, and mobile-device simulation specifically to evade crawler detection on the target site. In the context of a scraping skill, these are not neutral tuning tips; they are instructions for bypassing anti-abuse mechanisms, which increases the likelihood of misuse and unauthorized collection.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: This is the strongest finding because it advises reusing a real cookie obtained after manual verification, spoofing Referer, using proxies, and limiting access frequency to get around anti-crawler and CAPTCHA controls. Reusing real session material and falsifying request provenance crosses from generic scraping into concrete bypass of protective mechanisms, creating elevated legal, privacy, and unauthorized-access risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The script captures screenshots of pages that may include captcha challenges or account/session context and instructs users to save a verified browser session to disk. Persisting this material without safeguards can expose cookies, authenticated state, browsing artifacts, and other sensitive session data that could be reused by unauthorized parties.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: The file's narrative instructions are centered on defeating captcha protections and avoiding detection, while providing no authorization checks, approved scope, or compliance guardrails. In the context of a real-estate spider skill, this makes the capability more dangerous because it operationalizes scraping against defended targets rather than merely describing lawful data extraction.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal