Distributed Failure Analyzer

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only distributed-systems diagnostic skill, and the scanner concerns appear to be false positives from technical reference text.

Install only if you are comfortable letting the agent inspect the project files, configs, logs, or incident notes you provide. Use a sanitized or narrowed workspace if those materials contain secrets, credentials, customer data, or sensitive production telemetry.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Memory PoisoningPersistent Context Injection, Context Window Stuffing, Memory Manipulation
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (2)

Unrestricted Tool Access

Medium

Category: Excessive Agency
Content: 2. **Inter-node clock skew**: for systems using timestamp-based conflict resolution, directly measure clock differences between nodes. Alert when skew approaches the granularity of your timestamps. 3. **Node removal on excessive drift**: any node whose clock drifts too far from the others should be declared dead and removed from the cluster. The node's incorrect timestamps can corrupt data or cause incorrect ordering. **Tools:** `ntpq -p` (NTP status), `chronyc tracking` (chrony), Prometheus `node_timex_offset_seconds` metric, AWS CloudWatch `ClockErrorBound` for Spanner-equivalent services. ---
Confidence: 85% confidence
Finding: Tools:*

Memory Manipulation

High

Category: Memory Poisoning
Content: LWW silently discards writes when a node with a lagging clock overwrites values from a node with a fast clock. Clock skew between nodes under 3ms can cause this. The application receives no error. The data is simply gone. **"The node is dead — it stopped responding."** The node may be in a stop-the-world GC pause. It will resume, discover that it was declared dead, and attempt to continue its previous role. Without fencing tokens, this zombie behavior can corrupt state. **"We need Byzantine fault tolerance because we can't trust all nodes."** In a datacenter where your organization controls all nodes, Byzantine fault tolerance is almost certainly not needed and its cost (algorithmic complexity, performance overhead) is not justified. Standard authentication and checksums handle the realistic "lying" cases.
Confidence: 90% confidence
Finding: corrupt state

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal