Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Agent Security Harness

Security test AI agent systems against protocol-level attacks. Use when: (1) testing MCP servers for tool poisoning, capability escalation, or protocol downg...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 47 · 0 current installs · 0 all-time installs
byMichael 'Mike' K. Saleme@msaleme
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (agent security test harness) match the declared requirements: Python, pip, and a CLI binary named agent-security. Optional enterprise API keys referenced in the docs (PLATFORM_API_KEY, ALPACA_PAPER_API_KEY) are consistent with optional enterprise/trading adapter tests.
Instruction Scope
SKILL.md instructs invoking the installed CLI and bundled test modules (e.g., agent-security test mcp, python -m testing.mock_mcp_server). Instructions focus on sending crafted inputs to target endpoints and analyzing responses; it explicitly warns not to run against production and to use scoped test credentials. There is no guidance to read unrelated system files, exfiltrate secrets, or post results to unexpected external endpoints.
Install Mechanism
Installation guidance points to PyPI and GitHub releases (pip install agent-security-harness==3.6.0) and suggests provenance verification. Using PyPI for a Python CLI is appropriate; no arbitrary download URLs or extract-from-unknown-host steps are present in SKILL.md.
Credentials
No required environment variables are declared. The SKILL.md documents optional variables for enterprise/trading adapters and gives appropriate advice to use scoped test credentials and not production keys. The optional env vars are proportional to the described integrations.
Persistence & Privilege
The skill is instruction-only, has always:false, and does not request persistent system presence or modification of other skills or global agent config. Autonomous invocation is allowed (platform default) but not combined with other concerning privileges.
Assessment
This appears to be a legitimate Python CLI security harness. Before installing or running: 1) only run tests against authorized staging/test systems and scoped test accounts (do not run against production payment endpoints), 2) verify the PyPI package and GitHub release signatures/tags if possible, 3) inspect the package source code if you need stronger assurance, 4) run inside an isolated environment (VM/container) and monitor network activity when performing adversarial tests, and 5) supply only scoped test API keys (never production credentials).

Like a lobster shell, security has layers — review code before you run it.

Current versionv3.6.0
Download zip
a2avk973vxxn0f44tam8k1128vsq4983ht8yagent-securityvk973vxxn0f44tam8k1128vsq4983ht8yjailbreakvk973vxxn0f44tam8k1128vsq4983ht8yl402vk973vxxn0f44tam8k1128vsq4983ht8ylatestvk97fx0bzb4d651fv6aa8vd2f3983gcsbmcpvk973vxxn0f44tam8k1128vsq4983ht8ynistvk973vxxn0f44tam8k1128vsq4983ht8yover-refusalvk973vxxn0f44tam8k1128vsq4983ht8yowaspvk973vxxn0f44tam8k1128vsq4983ht8yprovenancevk973vxxn0f44tam8k1128vsq4983ht8yred-teamvk973vxxn0f44tam8k1128vsq4983ht8ysecurityvk973vxxn0f44tam8k1128vsq4983ht8ytestingvk973vxxn0f44tam8k1128vsq4983ht8yx402vk973vxxn0f44tam8k1128vsq4983ht8y

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🛡️ Clawdis
Binspython3, pip, agent-security

SKILL.md

Agent Security Harness

363 security tests for AI agent systems. 4 wire protocols (MCP, A2A, L402, x402), 20 enterprise platforms, GTG-1002 APT simulation, false positive rate testing, supply chain provenance, jailbreak resistance. Zero external dependencies for core protocol modules.

Current version: v3.6.0 | PyPI | GitHub | Apache 2.0

Safety

Non-destructive by default. All 363 tests send crafted inputs and analyze responses. No tests modify target state, delete data, or execute write operations.

Do NOT run against production systems without explicit authorization. Use isolated staging/test environments and test accounts, especially for payment endpoints (L402, x402).

Payment tests (L402/x402): Send crafted payment challenges and analyze responses. They do NOT execute real transactions, transfer funds, or interact with live payment networks.

Required Environment

Python 3.10+ and pip are required.

Environment variables:

VariableRequiredPurpose
(none by default)-Most tests only need a target URL passed via CLI --url flag
PLATFORM_API_KEYOnly for enterprise adapter testsPlatform-specific API key (SAP, Salesforce, Workday, etc.) - use scoped test credentials only
ALPACA_PAPER_API_KEYNoOnly for trading-related integration tests

No environment variables are required for standard protocol testing (MCP, A2A, L402, x402, over-refusal, provenance, jailbreak). The target URL is passed as a CLI argument, not an environment variable.

Credential guidance: If you use enterprise adapter tests that require API keys, store credentials securely using environment variables or .env files. Never commit API keys to version control. Only provide scoped test credentials, never production keys.

Install

# Install from PyPI (pinned version recommended)
pip install agent-security-harness==3.6.0

# Verify installation
agent-security version
# Expected output: 3.6.0

Source verification:

Dependencies: Core protocol modules (MCP, A2A, L402, x402, over-refusal, provenance, jailbreak) use Python stdlib only (zero external dependencies). Application-layer suite requires requests and geopy.

Quick Reference

# List all harnesses and tests
agent-security list
agent-security list mcp

# Test an MCP server (requires only a URL)
agent-security test mcp --transport http --url http://localhost:8080/mcp

# Test an A2A agent
agent-security test a2a --url https://agent.example.com

# Test L402 payment endpoint (Lightning) - non-destructive
agent-security test l402 --url https://l402-endpoint.com

# Test x402 payment endpoint (Coinbase/USDC) - non-destructive
agent-security test x402 --url https://x402-endpoint.com

# Test x402 with specific paid endpoint path
agent-security test x402 --url https://apibase.pro --paid-path /api/v1/tools/geo.geocode/call

# Test false positive rate (over-refusal)
agent-security test over-refusal --url http://localhost:8080/mcp

# Test supply chain provenance and attestation
agent-security test provenance --url http://localhost:8080/mcp

# Test jailbreak resistance
agent-security test jailbreak --url http://localhost:8080/mcp

# Test capability profile boundaries
agent-security test capability-profile --url https://agent.example.com

# Test harmful output safeguards
agent-security test harmful-output --url https://agent.example.com

# Test CBRN content prevention
agent-security test cbrn --url https://agent.example.com

# Test incident response readiness
agent-security test incident-response --url https://agent.example.com

# Statistical confidence intervals (NIST AI 800-2 aligned)
agent-security test mcp --url http://localhost:8080/mcp --trials 10

# Rate-limit for production endpoints (milliseconds between tests)
agent-security test a2a --url https://agent.example.com --delay 1000

# Try without a server (bundled mock MCP server)
python -m testing.mock_mcp_server  # Terminal 1: starts on port 8402
agent-security test mcp --transport http --url http://localhost:8402/mcp  # Terminal 2

Harness Modules (18 modules, 363 tests)

CommandTestsWhat It Tests
test mcp11MCP wire-protocol (JSON-RPC 2.0): tool poisoning, capability escalation, protocol downgrade, resource traversal, sampling hijack, context displacement
test a2a12A2A protocol: Agent Card spoofing, task injection, push notification redirect, skill injection, context isolation
test l40215L402 payments: macaroon tampering, preimage replay, caveat escalation, invoice validation
test x40243x402 payments: recipient manipulation, session theft, facilitator trust, cross-chain confusion, spending limits, health checks. Includes Agent Autonomy Risk Score (0-100)
test enterprise33Tier 1 enterprise: SAP, Salesforce, Workday, Oracle, ServiceNow, Microsoft, Google, Amazon, OpenClaw
test extended-enterprise38Tier 2 enterprise: IBM Maximo, Snowflake, Databricks, Pega, UiPath, Atlassian, Zendesk, IFS, Infor, HubSpot, Appian
test framework24Framework adapters: LangChain, CrewAI, AutoGen, OpenAI Agents SDK, Bedrock
test identity18NIST NCCoE Agent Identity: identification, authentication, authorization, auditing, data flow, standards compliance
test gtg100219GTG-1002 APT simulation: 6 campaign phases + hallucination detection
test advanced10Advanced patterns: polymorphic injection, stateful escalation, multi-domain chains, jailbreak persistence
test over-refusal25False positive rate: legitimate requests across all protocols that should NOT be blocked. Measures FPR with Wilson CI
test provenance19Supply chain: fake provenance, spoofed attestation, marketplace integrity, CVE-2026-25253 attack patterns
test jailbreak27Jailbreak resistance: DAN variants, token smuggling, authority impersonation, context manipulation, persistence
test return-channel8Return channel poisoning: output injection, ANSI escape, context overflow, encoded smuggling, structured data poisoning
test capability-profile11Executor capability boundary validation, profile escalation prevention
test harmful-output11Toxicity, bias, scope violations, deception (AIUC-1 C003/C004)
test cbrn9Chemical/biological/radiological/nuclear content safeguards (AIUC-1 F002)
test incident-response9Alert triggering, kill switch, log completeness, recovery (AIUC-1 E001-E003)

Output Format

All harnesses produce JSON reports with:

  • Pass/fail per test with test ID and OWASP ASI mapping
  • Full request/response transcripts for audit
  • Elapsed time per test
  • Wilson score confidence intervals (with --trials N)
  • x402 harness adds: CSG mapping, financial impact estimation, Agent Autonomy Risk Score

When to Use Each Harness

  • Building an MCP server? Run test mcp before deploying
  • Exposing an A2A agent? Run test a2a to check Agent Card and task security
  • Adding agent payments? Run test l402 (Lightning) or test x402 (USDC) before going live
  • Deploying on enterprise platforms? Run test enterprise with your platform name
  • Red-teaming an agent system? Run test gtg1002 for full APT campaign simulation
  • Need compliance evidence? Use --trials 10 for NIST AI 800-2 aligned statistical reports
  • Preparing for AIUC-1 certification? Run all harnesses for B001/C010/D004 evidence
  • Checking false positive rate? Run test over-refusal to verify security controls don't break legitimate use
  • Validating supply chain integrity? Run test provenance (especially relevant after CVE-2026-25253)
  • Testing jailbreak resistance? Run test jailbreak for DAN variants and encoding evasion
  • Checking agent capability boundaries? Run test capability-profile to verify escalation prevention
  • Validating safety controls? Run test harmful-output and test cbrn for content safeguards
  • Testing incident response? Run test incident-response for kill switch and recovery validation

Research

This harness is part of a published research program on autonomous AI agent governance:

Source & Provenance

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…