TXT电子书清洗修复

Security checks across malware telemetry and agentic risk

Overview

This ebook-cleaning skill is coherent, but it can search local TXT files, upload selected books, send text to an LLM-backed agent, and persist learned rules without strong privacy controls.

Review carefully before installing. Use fast/local rule mode for private, copyrighted, or sensitive books; avoid fuzzy file search unless you are comfortable exposing candidate filenames; and do not use AI modes unless you accept that book text may be processed by an OpenClaw subagent or configured LLM provider. Check and delete learned_mojibake_rules.json if you do not want cross-run learning.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (31)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill documentation describes file search, upload/download, local file modification, and shell execution, yet no permissions are declared. This creates a transparency and governance gap: users and enforcement systems cannot accurately understand that the skill can read local files, write outputs, and invoke command-line tools, which increases the chance of overbroad file access or unsafe execution without informed consent.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 89% confidence
Finding: The declared purpose is text cleanup, but the behavior includes external/local LLM invocation, request/response caching, persistent rule learning, and extra report generation. These undisclosed secondary behaviors materially change the data-flow and retention model, especially because ebook contents may be sensitive and could be stored, reused, or sent to another service beyond what users expect.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The cleaner initializes an LLM client and passes book content into AI modules, which creates an external data exposure path not expected from a simple local TXT-cleaning utility. If the provider is remote, sensitive or copyrighted text may be transmitted off-host, creating privacy, compliance, and trust risks even though there is no explicit exfiltration logic in this file.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The module sends user/book paragraph content to an external LLM for ad classification, which is a real data-exposure issue when the skill is described primarily as TXT cleaning/repair. Even if functionally useful, transmitting book content off-device can violate user expectations, privacy requirements, or content-licensing boundaries if not explicitly disclosed and consented to.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: This code adds network-dependent LLM analysis capability to a text-cleaning skill, expanding the trust and attack surface beyond what users may assume from a local cleaning tool. In this context, the danger is not code execution but undisclosed remote processing of potentially copyrighted or sensitive book content.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The parser sends up to 5000 characters of user-provided book text to an external LLM service for chapter detection. In a local TXT cleaning/repair skill, this creates a real data exfiltration/privacy risk because potentially copyrighted, sensitive, or private content leaves the local environment without any disclosure or clear necessity boundary in this file.

Description-Behavior Mismatch

Low

Confidence: 84% confidence
Finding: The normalization path transmits chapter titles to the LLM, which is a smaller data set but still user content sent off-box. While less severe than full text sampling, it remains an undisclosed external data transfer inconsistent with a purely local text-cleaning expectation.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: This module adds a network-capable LLM processing path to a skill whose primary purpose is text cleanup and repair, expanding the trust boundary beyond local processing. Because the capability is not tightly constrained here, users may unknowingly have ebook content transmitted to a third party, increasing privacy and compliance risk.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The module sends ebook text to an LLM via `self.llm.call(...)`, which means content that users may reasonably expect to be processed locally is transmitted to an external service. In a TXT-cleaning skill, this is a meaningful trust-boundary change and creates privacy and data-handling risk, especially for copyrighted, private, or sensitive text.

Context-Inappropriate Capability

Medium

Confidence: 79% confidence
Finding: The code persistently stores learned replacement rules derived from processed text, creating a durable artifact from user inputs without clear scoping, review, or consent. While this appears intended to improve repair quality, it can retain user-derived patterns and may cause cross-document contamination or unintended persistence of sensitive fragments.

Context-Inappropriate Capability

Medium

Confidence: 75% confidence
Finding: The skill declares a TXT-cleaning purpose, yet includes a generic LLM client that can spawn a separate local agent process. That broadens the runtime attack surface and may let untrusted ebook text be forwarded into a more capable agent context than users would reasonably expect from a document-cleanup skill.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The design explicitly routes user-supplied book text to external LLM providers such as OpenAI or Xiaoyi, but the document does not describe any user notice, consent flow, data minimization, or privacy controls. Because ebook content may include copyrighted, personal, or sensitive material, silent transmission to third-party APIs creates a real confidentiality and compliance risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The changelog documents LLM-based cleaning, caching, and API-request reduction, which strongly implies user text may be transmitted to an external or separate model process during processing, but it does not disclose that data handling risk. In a text-cleaning skill that processes entire ebook contents, lack of notice can cause users to unknowingly send sensitive or copyrighted material to an LLM, creating privacy, compliance, and data-governance exposure.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: Broad trigger phrases like generic requests to clean or repair a txt file can cause accidental invocation. In this skill, misfire is more dangerous because activation may lead to searching device files, uploading user content, and processing it with AI services, turning a benign ambiguity into unintended data handling.

Vague Triggers

Medium

Confidence: 81% confidence
Finding: The examples show when the skill should trigger, but not when it should abstain, leaving ambiguous matching behavior. Because the workflow can search local storage and transfer files, unclear boundaries increase the risk of unintended execution on user content that was not meant for this tool.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The documented workflow uploads user files to the cloud, obtains a public URL, and re-downloads them, but gives no explicit privacy or transmission warning. This exposes potentially copyrighted, personal, or sensitive ebook contents to external infrastructure and network interception risk, especially if public URLs are accessible beyond the immediate processing context.

Missing User Warnings

High

Confidence: 98% confidence
Finding: AI-enhanced modes analyze file content with LLM/API components, but the documentation does not clearly state that user text may be sent to an external or separate model service. This is a significant disclosure failure because book files may contain personal notes, identifiers, or sensitive material, and users cannot make an informed privacy decision without knowing their content may leave the local environment.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The skill advertises broad trigger phrases such as '去广告', '修乱码', and similar generic wording that can overlap with ordinary conversation about text files or ebooks. In an agent environment, overly broad activation increases the chance of unintended invocation, which could cause the agent to search local files or process user data without sufficiently specific user intent.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The skill explicitly supports fuzzy requests like '清理一本txt' and instructs the agent to search the user's device for matching txt files. This ambiguity is dangerous because a casual or underspecified request can trigger local file discovery, exposing filenames and potentially leading to unintended processing of sensitive documents.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The code sends user-supplied text through AI modules for ad detection, mojibake fixing, and chapter parsing without any user-facing warning or consent flow. In the context of ebook cleaning, users are likely to assume local processing, so silent transmission of full file contents can leak personal, proprietary, or copyrighted material to third-party services.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: At this call site, paragraph text is sent to the LLM without any visible user warning, consent, or contextual disclosure in the module. That creates a privacy and transparency problem because arbitrary book text is exfiltrated to a third party during processing.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The single-paragraph path has the same issue as the batch path: it forwards raw input text to an external LLM service with no user-facing notice in this module. This is especially risky because users may expect a simple local text repair action, not remote content inspection.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The code performs remote LLM calls on user text samples without any user-facing warning or disclosure in this file. For a document-processing skill, silent transmission of content is dangerous because users may reasonably expect local-only handling of their books and may expose sensitive or proprietary text unintentionally.

Missing User Warnings

Low

Confidence: 82% confidence
Finding: Even though only titles are sent, the routine still transmits user-provided content to an external service without visible disclosure. This is a lower-volume privacy issue, but it remains a trust and transparency problem in the context of ebook cleanup.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: This call transmits user-provided text to an external LLM without any warning or disclosure in the module, which is a direct privacy and transparency issue. Users invoking a text-cleaning skill may not expect raw ebook contents to leave the local environment, making the mismatch materially risky.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal