warehouse-meta

Security checks across malware telemetry and agentic risk

Overview

This skill is purpose-aligned for warehouse metadata governance, but it can read sensitive warehouse samples and automatically write LLM-generated comments back to Hive at scale without a separate approval step.

Install only if you intend to let the skill inspect warehouse metadata and sample values. Run it first in MCP/read-only mode or set writeback.hive_comment=false, restrict --db and --table to a test scope, review generated comments before applying them, and avoid sending regulated or secret-bearing samples/view definitions to external LLM endpoints unless your data policy allows it.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (5)

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The trigger phrases are broad enough that ordinary discussion of metadata quality, NL2SQL accuracy, or onboarding confusion could activate the skill unintentionally. Because activation can lead to schema enumeration, sample-data collection, and possible Hive comment writeback, accidental triggering expands the chance of unintended data access or modification.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The MCP workflow explicitly collects column schemas, non-null sample values, and view definitions, then persists them to a local JSON file, without any prominent warning about secrets, personal data, or regulated fields that may appear in samples or SQL logic. In a data warehouse setting, even small samples and view definitions can expose PII, credentials, business logic, or other sensitive metadata, making this context more dangerous rather than less.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The architecture explicitly describes automatic write-back of generated comments to Hive when confidence thresholds are met, but it does not mention explicit user confirmation, authorization checks, dry-run mode, or rollback/audit safeguards. Because this skill operates against external metadata systems, incorrect or overbroad automation could silently alter production catalog metadata, degrading trust, confusing downstream users, and potentially causing governance or compliance issues at scale.

Missing User Warnings

Low

Confidence: 84% confidence
Finding: The document states that all inference results are persisted to SQLite and that DML expressions are also persisted, without documenting retention limits, minimization, access controls, or sensitivity handling. Persisting metadata-derived artifacts can expose schema intelligence, business logic, and potentially sensitive descriptions longer than intended, especially if the local SQLite store is shared, backed up, or insufficiently protected.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: Approved comments are written back to Hive metadata automatically during scanning, without a distinct user confirmation step per change or dry-run approval workflow. Because comments are generated by LLM-based logic from schema, samples, and external model responses, incorrect or poisoned output can silently modify production metadata at scale and degrade downstream analytics or governance systems.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal