last30days-glim

Security checks across malware telemetry and agentic risk

Overview

This looks like a real social-research skill, but it needs Review because it can use local GitHub credentials and external AI/search services in ways that may expose sensitive repo or topic data.

Install only if you are comfortable sending research topics and retrieved public content to Glim and related external services. Use a restricted environment: avoid running it where gh is logged into private repositories, prefer a narrowly scoped GitHub token or no GitHub auth, and do not run scripts/evaluate.py with real credentials or untrusted revisions. Treat --save-dir outputs as full raw data files, not sanitized summaries.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import

Findings (24)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def create_worktree(rev: str) -> Path: worktree_dir = Path(tempfile.mkdtemp(prefix="last30days-eval-")) subprocess.run( ["git", "worktree", "add", "--detach", str(worktree_dir), rev], cwd=REPO_ROOT, check=True,
Confidence: 79% confidence
Finding: subprocess.run( ["git", "worktree", "add", "--detach", str(worktree_dir), rev], cwd=REPO_ROOT, check=True, capture_output=True, text=True, )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 81% confidence
Finding: The skill declares extensive capabilities such as environment access, file access, network use, and shell execution, but does not explicitly declare permissions. That creates a transparency and governance gap: users and host agents cannot accurately evaluate or constrain what the skill may do before invocation.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 88% confidence
Finding: The declared description presents the skill as a research brief generator, but the documented behavior includes broader functions like benchmark tooling, git worktree manipulation, subprocess testing, and persistence. This mismatch can mislead users or orchestration systems into authorizing a skill for a narrower purpose than it actually performs.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The documented guard says bare generic terms should not be used because matching is a case-insensitive substring over the entire topic. Including the standalone pattern "topic" violates that guarantee and can cause unrelated user queries containing that common word to be misclassified as `ai_coding_agent`, which can steer downstream collection toward the wrong communities and degrade or bias research results.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: This module implements person profiling, own-repo intelligence, README harvesting, release mining, and repository health summaries that go beyond the advertised scope of recent discussion research. Scope expansion matters because it increases data collection and analysis capabilities without clear user expectation, enabling broader surveillance-style profiling of people and projects.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: Project-mode returns repository metadata, README excerpts, releases, and issue summaries even when those outputs are not tied to 'what people said in the last 30 days.' That creates a mismatch between declared purpose and actual collection behavior, which can expose more repository content than users reasonably expect from a discussion-research skill.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: Falling back to the GitHub CLI lets the skill obtain credentials from the host environment through an additional execution path that is not necessary for basic discussion research. In an agent setting, unnecessary capability to invoke local tools increases risk because it broadens access to ambient authority and secrets on the machine.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The test suite hard-codes a refusal gate for demographic shopping prompts such as age, gender, or relationship-based gift queries, which does not align with the advertised purpose of a last-30-days discussion research skill. This creates a hidden behavioral policy layer that can cause denial of legitimate user requests and indicates the skill may be repurposed or constrained in ways not disclosed by the manifest.

Vague Triggers

Medium

Confidence: 76% confidence
Finding: The manifest uses many broad natural-language triggers like 'what's new with' and 'what are people saying about', which are likely to match normal conversation unintentionally. That increases the chance of unexpected skill activation, causing external network calls and use of configured credentials without clear user intent.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The evaluator forwards a broad set of sensitive credentials, including API keys and social-platform tokens, into subprocess environments before running repository code from baseline and candidate revisions. Because those revisions are effectively untrusted code, they can read and exfiltrate all inherited secrets, making this a strong credential-exposure issue.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The script sends topics and candidate item metadata, including titles, URLs, and dates, to the external Gemini API for relevance judging. This is a real privacy and data-governance issue because user-supplied research topics or scraped content may be sensitive, and the code provides no visible consent gate, redaction, or clear notice.

Missing User Warnings

Low

Confidence: 82% confidence
Finding: This code executes external programs and repository revisions as part of evaluation without any safety boundary or warning. In this skill's context, that is more dangerous because the tool is intended for broad research workflows and may be run by users who do not expect that comparing revisions will execute potentially unsafe code from historical or arbitrary refs.

Missing User Warnings

Medium

Confidence: 82% confidence
Finding: When users choose --save-dir, the code always writes the full report to disk, including all items, all sources, and transcripts, even if stdout is rendered in a compact form. That can unintentionally persist sensitive or high-volume research data to local storage where other users, backups, sync tools, or later processes may access it, especially because the code does not present an explicit warning at save time.

Missing User Warnings

Low

Confidence: 98% confidence
Finding: Because `detect_category` uses simple substring matching with first-match-wins, the broad token "topic" can silently force many inputs into the coding-agent bucket without any user-facing indication. In this skill, that means a user asking for recent discussion on almost any subject could be routed to coding-related peer subreddits, producing misleading summaries rather than directly compromising code execution or secrets.

Missing User Warnings

Low

Confidence: 75% confidence
Finding: The code silently reads GitHub credentials from GITHUB_TOKEN or the local gh login state without clear disclosure at the call sites. While common in tooling, in an agent skill this can surprise operators and users by causing authenticated access under ambient credentials they did not realize would be used.

Missing User Warnings

Low

Confidence: 72% confidence
Finding: The module transmits topic queries and repo identifiers to GitHub's API without an execution-time notice. For a research skill that may receive sensitive investigation topics, silent third-party transmission can create privacy and policy issues even if the destination is expected.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The reranking path sends the user topic plus scraped titles, snippets, and related public content to an external LLM provider via generate_json. Even though the prompt includes an untrusted-content warning to resist prompt injection, there is no evidence in this file of user notice, consent, minimization, or provider-side privacy controls, so sensitive user queries or third-party content may be disclosed externally.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The fun-scoring path sends the topic, candidate snippets, and extracted comment text to an external provider, increasing privacy exposure because comments may contain personal data, handles, or sensitive contextual details. This is more risky in a social-discussion research skill because the normal workflow aggregates large amounts of user-generated content, making silent third-party transmission more likely and broader in scope.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The function sends the raw user-supplied topic to an external search backend (`grounding.web_search`) across multiple queries without any consent check, masking, or in-band user warning at this layer. Because this skill is explicitly designed to research arbitrary user topics across third-party services, users may enter sensitive plans, company names, incident details, health topics, or other confidential terms that are then disclosed to the external provider.

Natural-Language Policy Violations

Medium

Confidence: 93% confidence
Finding: These tests explicitly enforce demographic-based refusals through natural-language assertions, confirming that the behavior is intentional and expected rather than incidental. In the context of a research skill, this can silently bias or block inputs based on demographic phrasing, reducing integrity and potentially enabling unfair or confusing request handling.

Ssd 1

Medium

Confidence: 79% confidence
Finding: The skill explicitly states that its local voice contract overrides global preferences and instructs the model to treat some tool safety cues as overridden. Even though framed as formatting guidance, this kind of instruction-priority language can erode higher-level safeguard adherence and increase prompt-injection susceptibility in agent environments.

Ssd 1

Medium

Confidence: 93% confidence
Finding: The stderr message explicitly addresses the hosting model and tells it to treat itself as the planner and to follow skill-level process instructions ('LAW 7', 'Step 0.75'). In an agent setting, that is cross-boundary control-plane guidance from untrusted skill code to the surrounding orchestrator/model, which can bias the host to override its normal delegation and policy behavior. In this skill context, the danger is elevated because the tool is intentionally LLM-driven and likely to run inside another agent that may consume or react to tool stderr/log output.

Ssd 3

Medium

Confidence: 88% confidence
Finding: These tests explicitly require verbatim pass-through of a footer block, which can institutionalize a dangerous rendering contract: downstream agents may learn to echo content without applying safety review or output filtering. In a skill that aggregates untrusted internet content and agent-produced text, a pass-through guarantee increases the chance that prompt-injection instructions, sensitive text, or deceptive content are relayed directly to users.

Ssd 4

Medium

Confidence: 84% confidence
Finding: The envelope and boundary scheme defines a staged workflow telling a model what to treat as synthesis material versus what to pass through exactly, which is a classic prompt-channel control pattern. When applied to untrusted or mixed-trust content, this can create a reliable bypass where attacker-controlled text is wrapped in trusted markers and forwarded with reduced scrutiny, especially in an agent ecosystem that already handles internet-derived evidence.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal