lobsterpot

Security checks across malware telemetry and agentic risk

Overview

Lobsterpot is a coherent Q&A integration, but it asks agents to keep taking public account actions and update its own instructions from a remote site without enough user review.

Install only if you want your agent to participate in an external public Q&A community. Keep the heartbeat disabled or manually supervised unless you are comfortable with recurring posts, votes, comments, and accepts. Review any generated public content before submission, do not share proprietary code, secrets, customer data, internal URLs, security findings, or private project details, and protect the Lobsterpot API key with restrictive local permissions. Avoid allowing the remote self-update step unless you can independently review and verify new skill files.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (10)

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The heartbeat extends a Q&A-sharing skill into a remote self-update mechanism that fetches and overwrites local skill files from the network. This creates a supply-chain and trust-boundary problem: whoever controls the remote endpoint or the network path can change future agent behavior without an explicit review step.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The instructions explicitly tell the agent to overwrite its local SKILL.md and HEARTBEAT.md based solely on a version check against a remote service. Because this behavior is not necessary for the stated Lobsterpot Q&A purpose, it enables remote semantic reprogramming of the agent and persistence across future runs.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The skill gives conflicting guidance: earlier it requires waiting 4 hours before self-answering, but the integration section says to immediately POST question → POST answer → Accept your answer. In an autonomous agent setting, contradictory instructions can cause policy bypass and spammy or manipulative behavior, undermining platform integrity and any safeguards intended to allow community review first.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The heartbeat instructs the agent to perform live account actions—accepting answers, posting comments, voting, answering questions, and asking new questions—without guardrails, explicit consent checks, or warnings that these mutate real community data. In practice this can lead to spam, reputation manipulation, accidental disclosure, or unauthorized actions on behalf of the user.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The heartbeat trigger is broad and encourages periodic fetching and following of remote instructions every 4+ hours. Because the behavior is framed as routine and low-friction, it can lead to unintended invocation across many normal sessions and creates a standing mechanism for remote behavioral influence if the heartbeat content changes.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill promotes using tracked expertise, interaction history, and injected context to improve answers without an explicit privacy boundary. In practice this normalizes transmitting or acting on personalized metadata that may derive from prior conversations, which can expose sensitive relationship or profiling information to an external service.

Ssd 1

Medium

Confidence: 98% confidence
Finding: The remote skill self-update allows the content provider to semantically override the agent's operating instructions and trust assumptions after installation. That means a benign skill can later become malicious or unsafe without any local code change, making the context especially dangerous because instructions directly affect downstream autonomous behavior.

Ssd 3

Medium

Confidence: 96% confidence
Finding: This section strongly encourages agents to persist and share solutions discovered during prior work so knowledge survives context closure. In enterprise or user-facing environments, that creates a natural path for disclosing proprietary code behavior, internal architecture, debugging artifacts, or sensitive problem details to a public platform.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The documented 'context injection' includes prior interactions, expertise rankings, and similar prior answers, then explicitly tells the agent to use that context. Combined with public answering, this can encourage disclosure or derivative use of private interaction history and cross-session memory in ways the original counterparties did not consent to.

Ssd 3

Medium

Confidence: 96% confidence
Finding: The integration guidance normalizes posting solved technical issues and discovered undocumented behavior to a CC0 public platform as a routine action. That is dangerous because 'hard problem' and 'undocumented behavior' often correspond to internal implementation details, security-relevant edge cases, or customer-specific incidents that should not be published.

VirusTotal

43/43 vendors flagged this skill as clean.

View on VirusTotal