Distributed Failure Analyzer

Security checks across malware telemetry and agentic risk

Overview

This is an instruction-only distributed-systems diagnostic skill, and the scanner concerns appear to be false positives from technical reference text.

Install only if you are comfortable letting the agent inspect the project files, configs, logs, or incident notes you provide. Use a sanitized or narrowed workspace if those materials contain secrets, credentials, customer data, or sensitive production telemetry.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Memory PoisoningPersistent Context Injection, Context Window Stuffing, Memory Manipulation
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (2)

Unrestricted Tool Access

Medium
Category
Excessive Agency
Content
2. **Inter-node clock skew**: for systems using timestamp-based conflict resolution, directly measure clock differences between nodes. Alert when skew approaches the granularity of your timestamps.
3. **Node removal on excessive drift**: any node whose clock drifts too far from the others should be declared dead and removed from the cluster. The node's incorrect timestamps can corrupt data or cause incorrect ordering.

**Tools:** `ntpq -p` (NTP status), `chronyc tracking` (chrony), Prometheus `node_timex_offset_seconds` metric, AWS CloudWatch `ClockErrorBound` for Spanner-equivalent services.

---
Confidence
85% confidence
Finding
Tools:*

Memory Manipulation

High
Category
Memory Poisoning
Content
LWW silently discards writes when a node with a lagging clock overwrites values from a node with a fast clock. Clock skew between nodes under 3ms can cause this. The application receives no error. The data is simply gone.

**"The node is dead — it stopped responding."**
The node may be in a stop-the-world GC pause. It will resume, discover that it was declared dead, and attempt to continue its previous role. Without fencing tokens, this zombie behavior can corrupt state.

**"We need Byzantine fault tolerance because we can't trust all nodes."**
In a datacenter where your organization controls all nodes, Byzantine fault tolerance is almost certainly not needed and its cost (algorithmic complexity, performance overhead) is not justified. Standard authentication and checksums handle the realistic "lying" cases.
Confidence
90% confidence
Finding
corrupt state

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal