Local Ai Search

Security checks across malware telemetry and agentic risk

Overview

This local search skill is broadly coherent, but it can index large private folders, run a background Khoj service, create cron-based sync jobs, and expose or send document-derived content beyond the local machine if configured carelessly.

Install only if you are comfortable running a local Khoj RAG service over selected folders. Bind the service to 127.0.0.1, avoid anonymous mode for sensitive indexes, keep KHOJ_URL local unless you intentionally trust a remote server, and do not enable scheduled sync on broad or sensitive directories without understanding that it will keep processing files in the background. Treat cloud chat mode as potentially sending queries and relevant document excerpts to the configured LLM provider.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (44)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def stop(): """停止 Khoj 服务""" try: result = subprocess.run( ["pkill", "-f", "khoj"], capture_output=True )
Confidence: 83% confidence
Finding: result = subprocess.run( ["pkill", "-f", "khoj"], capture_output=True )

subprocess module call

Medium

Category: Dangerous Code Execution
Content: cmd.append("--verbose") try: subprocess.run(cmd, check=True) except subprocess.CalledProcessError as e: click.echo(f"✗ 同步失败: {e}", err=True) sys.exit(1)
Confidence: 68% confidence
Finding: subprocess.run(cmd, check=True)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: cmd.append("--status") try: subprocess.run(cmd, check=True) except subprocess.CalledProcessError as e: click.echo(f"✗ 执行失败: {e}", err=True) sys.exit(1)
Confidence: 72% confidence
Finding: subprocess.run(cmd, check=True)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill advertises local file search but declares no permissions while its documented behavior clearly requires reading and writing local files, invoking shell commands, accessing environment variables, and making network requests. This hidden capability expansion is dangerous because users and policy systems cannot make an informed trust decision, especially when the skill also uploads content to a local indexing service and may forward query content to cloud APIs.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented behavior materially exceeds the declared purpose of a simple local search skill: it starts background services, converts and indexes documents, persists synchronization state, schedules recurring tasks, and supports chat/RAG workflows. This mismatch is dangerous because users invoking a search skill may unknowingly authorize durable collection, transformation, and ongoing processing of local data beyond a one-time search request.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The README presents the skill as a local search capability, but the setup and feature list require a cloud LLM API key and optionally configure a third-party endpoint. That mismatch can cause users to assume their queries and potentially document-derived content stay local when they may be sent off-device, creating privacy and trust risks.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The skill scope is presented as ad hoc local search, yet the documented commands include persistent background synchronization and scheduled tasks. This is risky because it changes the trust model from a one-time user action to ongoing filesystem monitoring and recurring execution, which can continue processing data after the initial request.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The skill scope is presented as ad hoc local search, yet the documented commands include persistent background synchronization and scheduled tasks. This is risky because it changes the trust model from a one-time user action to ongoing filesystem monitoring and recurring execution, which can continue processing data after the initial request.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The configuration sends prompts and potentially retrieved local document content to an external LLM provider, which exceeds the stated purpose of a local file search skill. In a tool designed to search sensitive local files, this creates a real data exposure path to a third party and may violate user expectations about locality and privacy.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Binding the service to 0.0.0.0 exposes it on all network interfaces, making a local search service reachable from other hosts on the network. Because this service indexes and searches local files, remote access could expose sensitive document contents or metadata if authentication and network controls are absent or weak.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill substantially exceeds its declared role of local file search by starting/stopping services, scheduling recurring jobs, and deleting local data. In agent settings, capability mismatch is dangerous because users may authorize a search tool without expecting process control, persistence, or destructive actions.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill invokes multiple subprocesses and shell scripts unrelated to a minimal local search interface, increasing the attack surface and host impact. In an agent context, external process execution is especially sensitive because it can bypass user expectations and platform safety boundaries.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The skill is described as performing local file search, but the implementation actually sends queries to an HTTP service and depends on an external index/search backend. This mismatch is security-relevant because users may believe their data stays on-device when in fact their queries, and potentially indexed local content, are exposed to a network service specified by configuration.

Context-Inappropriate Capability

Medium

Confidence: 82% confidence
Finding: The code includes a chat mode that posts arbitrary user queries to /api/chat, which is broader than the declared purpose of local file search. Extra undisclosed capability increases attack surface and can cause unexpected data transmission, especially if users invoke the skill believing it only performs retrieval over local documents.

Description-Behavior Mismatch

High

Confidence: 94% confidence
Finding: This script adds scheduled synchronization and persistence behavior to a skill whose declared purpose is local file search. Installing recurring background execution materially expands capability and risk, because it enables ongoing file access without an interactive user action and is not justified by the manifest context.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The script directly installs and updates persistent cron jobs in the user's crontab, creating durable background execution. In the context of a local search skill, this is overprivileged behavior that could continuously process or expose local content beyond what a user expects from an on-demand search feature.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The script starts a full Khoj server process instead of a narrowly scoped local-search helper, expanding the attack surface beyond the manifest's stated purpose. Running a general RAG service can expose additional APIs, indexing behavior, and network-reachable functionality that may process or retain local data unexpectedly.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The script daemonizes the service with nohup and writes persistent logs to /tmp, causing the capability to continue running outside the immediate user action. For a skill advertised as local file search, this broader persistent behavior increases the chance of unintended long-lived data exposure, background resource use, and unnoticed service availability.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The skill is described as local file search, but the code sends file contents to an HTTP API endpoint for indexing, using a configurable KHOJ_URL that may point off-host. This creates a significant data exfiltration and privacy risk because sensitive local documents can be transmitted to another service without clear user disclosure or origin restriction.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The trigger condition includes a broad catch-all for any request involving local/computer/folder content retrieval. In an agent setting, overly broad activation can invoke the skill on ordinary user requests unexpectedly, increasing the chance of indexing or querying sensitive local files without sufficiently explicit user intent.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The documentation instructs users to configure cloud LLM APIs for processing local document search, but it does not clearly disclose that local content, excerpts, or derived queries may be transmitted to external services. For a skill framed around local file search, that omission materially increases the risk of sensitive data exposure and informed-consent failure.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The trigger conditions are broad enough to match common everyday search phrases, increasing the chance the skill will activate unintentionally. In this context, accidental invocation is more dangerous because activation may lead to file indexing, local service startup, or cloud-assisted processing rather than a simple harmless search.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The usage examples broaden the activation scope and make it unclear whether the skill should perform a simple lookup, a full search across the computer, or a folder-specific indexing workflow. Ambiguous activation is risky here because the documented backend has non-trivial side effects and may touch large amounts of local data.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill uploads local document contents to the Khoj API without any user-facing warning or confirmation, and the endpoint is configurable via environment variable. In a tool advertised as local file search, silent transmission of document contents is dangerous because users may unintentionally exfiltrate sensitive files to a remote service.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The script modifies the user's crontab immediately when invoked with enable/disable options, without a confirmation prompt or dry-run preview. That makes persistence easy to add or remove accidentally, which is especially risky when the skill's stated purpose does not imply background task installation.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal