Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Agent Security Harness

v3.8.1

Security test AI agent systems against protocol-level attacks. Use when: (1) testing MCP servers for tool poisoning, capability escalation, or protocol downg...

1· 144·0 current·0 all-time
byMichael 'Mike' K. Saleme@msaleme
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description (protocol-level agent security testing) aligns with what the runtime instructions require: a Python environment and the agent-security CLI. Optional enterprise API keys are appropriate for adapters to systems like SAP/Salesforce. There are no unrelated credentials or surprising binaries requested.
Instruction Scope
SKILL.md instructs the agent to run the agent-security CLI against target URLs and to use scoped test credentials for enterprise adapters; this stays inside the stated purpose. Minor scope issues: it recommends verifying releases with git (but git is not listed as a required binary), and it mentions storing credentials in .env files (guidance only) and opt-in telemetry/Discord reporting — these are documented but users should confirm what telemetry sends before enabling.
Install Mechanism
There is no registry install spec (instruction-only), but the doc instructs users to pip install from PyPI and points to a GitHub repo and tagged releases. Installing from PyPI is typical for a Python CLI; this is moderate-risk relative to built-in-only skills but expected for this purpose. Links point to GitHub/PyPI (well-known hosts).
Credentials
No environment variables are required by default. The optional PLATFORM_API_KEY for enterprise adapters is proportional to adapter testing. The only minor oddity: ALPACA_PAPER_API_KEY is listed but annotated ambiguously ('No'), which appears to be documentation noise rather than a demand for unrelated credentials.
Persistence & Privilege
The skill is not marked always:true and does not request persistent system-wide privileges. It documents optional telemetry/CI/Discord integrations, which are opt-in; nothing in the manifest indicates the skill would autonomously install or persist beyond the normal CLI/tool usage.
Assessment
This appears to be a legitimate security test harness. Before installing or running: (1) run it only in isolated/staging environments and with scoped test credentials as the README warns; (2) if you enable telemetry, read what telemetry collects and opt out if you will be testing sensitive targets; (3) verify the PyPI/GitHub release signatures or tags if provenance matters (the doc suggests using git log, so ensure git is available locally before following those steps); (4) treat the optional enterprise API keys as sensitive — use test accounts and never provide production keys. The docs have small inconsistencies (e.g., a confusing ALPACA flag and recommending git without declaring it), but nothing that contradicts the declared purpose.

Like a lobster shell, security has layers — review code before you run it.

a2avk973vxxn0f44tam8k1128vsq4983ht8yagent-securityvk973vxxn0f44tam8k1128vsq4983ht8yjailbreakvk973vxxn0f44tam8k1128vsq4983ht8yl402vk973vxxn0f44tam8k1128vsq4983ht8ylatestvk97f0dfh9cas0v4dxxcxj565gh83vwjhmcpvk973vxxn0f44tam8k1128vsq4983ht8ynistvk973vxxn0f44tam8k1128vsq4983ht8yover-refusalvk973vxxn0f44tam8k1128vsq4983ht8yowaspvk973vxxn0f44tam8k1128vsq4983ht8yprovenancevk973vxxn0f44tam8k1128vsq4983ht8yred-teamvk973vxxn0f44tam8k1128vsq4983ht8ysecurityvk973vxxn0f44tam8k1128vsq4983ht8ytestingvk973vxxn0f44tam8k1128vsq4983ht8yx402vk973vxxn0f44tam8k1128vsq4983ht8y

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🛡️ Clawdis
Binspython3, pip, agent-security

SKILL.md

Agent Security Harness

332 security tests across 24 modules for AI agent systems. 4 wire protocols (MCP, A2A, L402, x402), 20+ enterprise platforms, GTG-1002 APT simulation, false positive rate testing, supply chain provenance, jailbreak resistance, AIUC-1 certification prep. Zero external dependencies for core protocol modules.

Current version: v3.8.1 | PyPI | GitHub | Apache 2.0

New in v3.8.1: MCP Server (expose harness as MCP tools for any AI agent), Attestation Registry (opt-in, Ed25519 signed), Telemetry (opt-in, GDPR compliant), GitHub Action for CI/CD, Free MCP Security Scan, AIUC-1 Certification Prep, Monthly Security Report pipeline, Discord Scan Bot. Validated at 97.9% pass rate (HRAO-E, 146 tests, Wilson 95% CI [0.943, 0.994]). 22 rounds of critical evaluation, 10/10 final score.

Safety

Non-destructive by default. All 332 tests send crafted inputs and analyze responses. No tests modify target state, delete data, or execute write operations.

Do NOT run against production systems without explicit authorization. Use isolated staging/test environments and test accounts, especially for payment endpoints (L402, x402).

Payment tests (L402/x402): Send crafted payment challenges and analyze responses. They do NOT execute real transactions, transfer funds, or interact with live payment networks.

Required Environment

Python 3.10+ and pip are required.

Environment variables:

VariableRequiredPurpose
(none by default)-Most tests only need a target URL passed via CLI --url flag
PLATFORM_API_KEYOnly for enterprise adapter testsPlatform-specific API key (SAP, Salesforce, Workday, etc.) - use scoped test credentials only
ALPACA_PAPER_API_KEYNoOnly for trading-related integration tests

No environment variables are required for standard protocol testing (MCP, A2A, L402, x402, over-refusal, provenance, jailbreak). The target URL is passed as a CLI argument, not an environment variable.

Credential guidance: If you use enterprise adapter tests that require API keys, store credentials securely using environment variables or .env files. Never commit API keys to version control. Only provide scoped test credentials, never production keys.

Install

# Install from PyPI (pinned version recommended)
pip install agent-security-harness==3.8.1

# Verify installation
agent-security version
# Expected output: 3.8.1

Source verification:

Dependencies: Core protocol modules (MCP, A2A, L402, x402, over-refusal, provenance, jailbreak) use Python stdlib only (zero external dependencies). Application-layer suite requires requests and geopy.

Quick Reference

# List all harnesses and tests
agent-security list
agent-security list mcp

# Test an MCP server (requires only a URL)
agent-security test mcp --transport http --url http://localhost:8080/mcp

# Test an A2A agent
agent-security test a2a --url https://agent.example.com

# Test L402 payment endpoint (Lightning) - non-destructive
agent-security test l402 --url https://l402-endpoint.com

# Test x402 payment endpoint (Coinbase/USDC) - non-destructive
agent-security test x402 --url https://x402-endpoint.com

# Test x402 with specific paid endpoint path
agent-security test x402 --url https://apibase.pro --paid-path /api/v1/tools/geo.geocode/call

# Test false positive rate (over-refusal)
agent-security test over-refusal --url http://localhost:8080/mcp

# Test supply chain provenance and attestation
agent-security test provenance --url http://localhost:8080/mcp

# Test jailbreak resistance
agent-security test jailbreak --url http://localhost:8080/mcp

# Test capability profile boundaries
agent-security test capability-profile --url https://agent.example.com

# Test harmful output safeguards
agent-security test harmful-output --url https://agent.example.com

# Test CBRN content prevention
agent-security test cbrn --url https://agent.example.com

# Test incident response readiness
agent-security test incident-response --url https://agent.example.com

# Statistical confidence intervals (NIST AI 800-2 aligned)
agent-security test mcp --url http://localhost:8080/mcp --trials 10

# Rate-limit for production endpoints (milliseconds between tests)
agent-security test a2a --url https://agent.example.com --delay 1000

# Try without a server (bundled mock MCP server)
python -m testing.mock_mcp_server  # Terminal 1: starts on port 8402
agent-security test mcp --transport http --url http://localhost:8402/mcp  # Terminal 2

MCP Server Mode

Use the harness as an MCP tool that any AI agent can call:

# stdio (for Cursor, Claude Desktop)
python -m mcp_server

# HTTP
python -m mcp_server --transport http --port 8400

Tools: scan_mcp_server (quick scan), full_security_audit (332 tests), aiuc1_readiness, get_test_catalog, validate_attestation.

Harness Modules (24 modules, 332 tests)

CommandTestsWhat It Tests
test mcp13MCP wire-protocol (JSON-RPC 2.0): tool poisoning, capability escalation, protocol downgrade, resource traversal, sampling hijack, context displacement
test a2a12A2A protocol: Agent Card spoofing, task injection, push notification redirect, skill injection, context isolation
test l40214L402 payments: macaroon tampering, preimage replay, caveat escalation, invoice validation
test x40225x402 payments: recipient manipulation, session theft, facilitator trust, cross-chain confusion, spending limits, health checks. Includes Agent Autonomy Risk Score (0-100)
test enterprise31Tier 1 enterprise: SAP, Salesforce, Workday, Oracle, ServiceNow, Microsoft, Google, Amazon, OpenClaw
test extended-enterprise27Tier 2 enterprise: IBM Maximo, Snowflake, Databricks, Pega, UiPath, Atlassian, Zendesk, IFS, Infor, HubSpot, Appian
test framework11Framework adapters: LangChain, CrewAI, AutoGen, OpenAI Agents SDK, Bedrock
test identity18NIST NCCoE Agent Identity: identification, authentication, authorization, auditing, data flow, standards compliance
test gtg100217GTG-1002 APT simulation: 6 campaign phases + hallucination detection
test advanced10Advanced patterns: polymorphic injection, stateful escalation, multi-domain chains, jailbreak persistence
test over-refusal25False positive rate: legitimate requests across all protocols that should NOT be blocked. Measures FPR with Wilson CI
test provenance15Supply chain: fake provenance, spoofed attestation, marketplace integrity, CVE-2026-25253 attack patterns
test jailbreak25Jailbreak resistance: DAN variants, token smuggling, authority impersonation, context manipulation, persistence
test return-channel8Return channel poisoning: output injection, ANSI escape, context overflow, encoded smuggling, structured data poisoning
test capability-profile10Executor capability boundary validation, profile escalation prevention
test harmful-output10Toxicity, bias, scope violations, deception (AIUC-1 C003/C004)
test cbrn8Chemical/biological/radiological/nuclear content safeguards (AIUC-1 F002)
test incident-response8Alert triggering, kill switch, log completeness, recovery (AIUC-1 E001-E003)
test aiuc112AIUC-1 compliance: all 24 certification requirements mapped
test cloud25Cloud agent platforms: AWS Bedrock, Azure AI, GCP Vertex, Anthropic, OpenAI
test cve-20268CVE-2026-25253 reproduction: supply chain tool poisoning at scale

CI/CD Integration (v3.8+)

# GitHub Action - drop into any workflow
- uses: msaleme/red-team-blue-team-agent-fabric@v3.8
  with:
    target_url: http://localhost:8080/mcp
# Free quick scan (5 tests, A-F grade)
python scripts/free_scan.py --url http://server:port/mcp --format markdown

# AIUC-1 certification readiness report
python scripts/aiuc1_prep.py --url http://server:port --simulate

# Monthly security report across multiple targets
python scripts/monthly_security_report.py

Output Format

All harnesses produce JSON reports with:

  • Pass/fail per test with test ID and OWASP ASI mapping
  • Full request/response transcripts for audit
  • Elapsed time per test
  • Wilson score confidence intervals (with --trials N)
  • x402 harness adds: CSG mapping, financial impact estimation, Agent Autonomy Risk Score

When to Use Each Harness

  • Building an MCP server? Run test mcp before deploying
  • Exposing an A2A agent? Run test a2a to check Agent Card and task security
  • Adding agent payments? Run test l402 (Lightning) or test x402 (USDC) before going live
  • Deploying on enterprise platforms? Run test enterprise with your platform name
  • Red-teaming an agent system? Run test gtg1002 for full APT campaign simulation
  • Need compliance evidence? Use --trials 10 for NIST AI 800-2 aligned statistical reports
  • Preparing for AIUC-1 certification? Run all harnesses for B001/C010/D004 evidence
  • Checking false positive rate? Run test over-refusal to verify security controls don't break legitimate use
  • Validating supply chain integrity? Run test provenance (especially relevant after CVE-2026-25253)
  • Testing jailbreak resistance? Run test jailbreak for DAN variants and encoding evasion
  • Checking agent capability boundaries? Run test capability-profile to verify escalation prevention
  • Validating safety controls? Run test harmful-output and test cbrn for content safeguards
  • Testing incident response? Run test incident-response for kill switch and recovery validation

Research

This harness is part of a published research program on autonomous AI agent governance:

Source & Provenance

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…