warehouse-meta

Security checks across malware telemetry and agentic risk

Overview

This skill is purpose-aligned for warehouse metadata governance, but it can read sensitive warehouse samples and automatically write LLM-generated comments back to Hive at scale without a separate approval step.

Install only if you intend to let the skill inspect warehouse metadata and sample values. Run it first in MCP/read-only mode or set writeback.hive_comment=false, restrict --db and --table to a test scope, review generated comments before applying them, and avoid sending regulated or secret-bearing samples/view definitions to external LLM endpoints unless your data policy allows it.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (5)

Vague Triggers

Medium
Confidence
83% confidence
Finding
The trigger phrases are broad enough that ordinary discussion of metadata quality, NL2SQL accuracy, or onboarding confusion could activate the skill unintentionally. Because activation can lead to schema enumeration, sample-data collection, and possible Hive comment writeback, accidental triggering expands the chance of unintended data access or modification.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The MCP workflow explicitly collects column schemas, non-null sample values, and view definitions, then persists them to a local JSON file, without any prominent warning about secrets, personal data, or regulated fields that may appear in samples or SQL logic. In a data warehouse setting, even small samples and view definitions can expose PII, credentials, business logic, or other sensitive metadata, making this context more dangerous rather than less.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The architecture explicitly describes automatic write-back of generated comments to Hive when confidence thresholds are met, but it does not mention explicit user confirmation, authorization checks, dry-run mode, or rollback/audit safeguards. Because this skill operates against external metadata systems, incorrect or overbroad automation could silently alter production catalog metadata, degrading trust, confusing downstream users, and potentially causing governance or compliance issues at scale.

Missing User Warnings

Low
Confidence
84% confidence
Finding
The document states that all inference results are persisted to SQLite and that DML expressions are also persisted, without documenting retention limits, minimization, access controls, or sensitivity handling. Persisting metadata-derived artifacts can expose schema intelligence, business logic, and potentially sensitive descriptions longer than intended, especially if the local SQLite store is shared, backed up, or insufficiently protected.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
Approved comments are written back to Hive metadata automatically during scanning, without a distinct user confirmation step per change or dry-run approval workflow. Because comments are generated by LLM-based logic from schema, samples, and external model responses, incorrect or poisoned output can silently modify production metadata at scale and degrade downstream analytics or governance systems.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal