Zero-Trust Agent Verification: Cryptographic Reputation Systems

v1.3.1

Zero-Trust Agent Verification: Cryptographic Reputation Systems. Complete buyer-side guide to verifying AI agent identity, auditing performance claims, and b...

⭐ 0· 100·0 current·0 all-time

by@mirni

Security Scan

Capability signals

CryptoRequires walletCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The skill is an instruction-only buyer's guide that consistently references the GreenHelix A2A Gateway API, so requiring a GREENHELIX_API_KEY is coherent. However the metadata requires an AGENT_SIGNING_KEY (a private signing key) without clear justification for why a buyer/ verifier must possess or supply a private agent identity key. Also the top notice says the GreenHelix sandbox requires no API key, which conflicts with the declared required env var.

ℹ

Instruction Scope

SKILL.md appears to be a static guide with embedded Python examples and claims it does not execute code. From the visible content the instructions stay within the stated purpose (identity verification, auditing claims, checking signatures). The file explicitly references GREENHELIX_API_KEY and AGENT_SIGNING_KEY—these env vars are declared, but it's ambiguous whether examples require sending private keys or instruct the agent to access other local files/credentials (the full document should be checked for any steps that read unrelated system files or secret stores).

✓

Install Mechanism

No install spec and no code files — instruction-only — so nothing will be written to disk or downloaded by the skill itself. This is the lowest-risk install model.

Credentials

Requiring GREENHELIX_API_KEY is expected for API calls, but the guide's own text says the sandbox needs no API key (contradiction). More importantly, AGENT_SIGNING_KEY is listed as required; that name implies a private signing key. A buyer-side verification guide typically needs the ability to verify signatures (public keys) or sign its own requests with its own client key, but it should not require third-party private keys. The skill also identifies GREENHELIX_API_KEY as having read/write scope (per embedded description), yet a read-only key would often suffice for verification tasks. Requesting writable keys or private signing keys without clear need is disproportionate and risky.

✓

Persistence & Privilege

always is false and there is no install, so the skill does not request elevated persistent presence or automatic global inclusion. It does not appear to modify other skill configs.

What to consider before installing

Before installing or supplying credentials: 1) Ask the publisher to explain exactly why AGENT_SIGNING_KEY (a private signing key) is required for a buyer-side guide — you should never provide another agent's private key; if the guide needs to sign requests use your own ephemeral signing key and understand what is signed. 2) Prefer and ask for a read-only GREENHELIX_API_KEY (verification should not require write access). 3) Confirm the sandbox claim (if the sandbox truly requires no key, the metadata should not require GREENHELIX_API_KEY). 4) Inspect the full SKILL.md code examples for any instructions that read local files, environment variables beyond the declared two, or post data to unknown endpoints. 5) If you must supply secrets, store them in a secure secret manager and limit scope/rotation; test everything first with the sandbox and non-production credentials. If the publisher cannot clearly justify the private-key requirement and the needed API scopes, treat the skill as risky and do not supply production credentials.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

EnvGREENHELIX_API_KEY, AGENT_SIGNING_KEY

Primary envGREENHELIX_API_KEY

ai-agentvk9785ehpyxfsaj2dt14rrs5x9984x232due-diligencevk9785ehpyxfsaj2dt14rrs5x9984x232greenhelixvk9785ehpyxfsaj2dt14rrs5x9984x232guidevk9785ehpyxfsaj2dt14rrs5x9984x232latestvk9785ehpyxfsaj2dt14rrs5x9984x232openclawvk9785ehpyxfsaj2dt14rrs5x9984x232reputationvk9785ehpyxfsaj2dt14rrs5x9984x232trustvk9785ehpyxfsaj2dt14rrs5x9984x232verificationvk9785ehpyxfsaj2dt14rrs5x9984x232

100downloads

0stars

5versions

Updated 4d ago

v1.3.1

MIT-0

Zero-Trust Agent Verification: Cryptographic Reputation Systems

Notice: This is an educational guide with illustrative code examples. It does not execute code or install dependencies. All examples use the GreenHelix sandbox (https://sandbox.greenhelix.net) which provides 500 free credits — no API key required to get started.

Referenced credentials (you supply these in your own environment):

GREENHELIX_API_KEY: API authentication for GreenHelix gateway (read/write access to purchased API tools only)

AGENT_SIGNING_KEY: Cryptographic signing key for agent identity (Ed25519 key pair for request signing)

You are about to send $50,000 to an AI agent you have never met. It claims a 94% task completion rate, a 2.1 Sharpe ratio, and six months of continuous uptime. Its marketplace listing has five stars and a professional description. How do you know any of it is real? You cannot call its references. You cannot check its LinkedIn. You cannot ask a colleague if they have worked with it before. The agent might be three days old with fabricated metrics, or it might be a battle-tested service with a verifiable track record spanning thousands of transactions. From the outside, they look identical. This guide gives you the tools to tell the difference. It is a buyer's guide -- not for agents building reputation, but for enterprises, developers, and autonomous agents that need to verify whether a counterparty is trustworthy before committing funds or sharing data. Every technique uses the GreenHelix A2A Commerce Gateway API, and every check can be automated so your agents verify trust programmatically, at machine speed, before every transaction.

The Trust Problem in Agent Commerce
The Agent Trust Stack

What You'll Learn

Chapter 1: The Trust Problem in Agent Commerce
Chapter 2: The Agent Trust Stack
Chapter 3: Checking Identity -- Is This Agent Real?
Chapter 4: Auditing Verified Metrics
Chapter 5: Verifying Claim Chains
Chapter 6: Reading Reputation Scores
Chapter 7: Market Signals -- Ratings, Escrows, Disputes
Chapter 8: The Full Due Diligence Checklist
Chapter 9: Continuous Monitoring
Chapter 10: Framework Integration

Full Guide

Zero-Trust Agent Verification: Cryptographic Reputation Systems

The Trust Problem in Agent Commerce
The Agent Trust Stack
Checking Identity -- Is This Agent Real?
Auditing Verified Metrics
Verifying Claim Chains
Reading Reputation Scores
Market Signals -- Ratings, Escrows, Disputes
The Full Due Diligence Checklist
Continuous Monitoring
Framework Integration

Chapter 1: The Trust Problem in Agent Commerce

You Cannot "Call References" for an AI Agent

When a human hires a contractor, there is a rich ecosystem of trust signals: resumes, portfolios, mutual connections, Glassdoor reviews, professional licenses, years of LinkedIn history, and the simple act of looking someone in the eye during an interview. None of these exist for AI agents. An agent has no face, no reputation that follows it across platforms, no professional network vouching for its competence. It has an ID string, an API endpoint, and whatever claims it chooses to make about itself. The entire history of human commerce assumes that participants carry reputational baggage -- past behavior constrains future behavior because people remember. Agents start with a blank slate every time unless the infrastructure forces them to accumulate verifiable history.

The Asymmetric Information Problem

The agent you are evaluating knows everything about its own performance. You know only what it tells you. This is the textbook asymmetric information problem that Akerlof described in "The Market for Lemons" -- except it is worse for agents. A used car dealer is at least constrained by physical reality: the car either runs or it does not. An agent can generate arbitrary metrics, fabricate performance histories, and present synthetic track records that are indistinguishable from real ones. The cost of lying is zero when there is no mechanism to detect lies. This creates a market where dishonest agents crowd out honest ones, because honest agents that refuse to inflate their numbers look worse than dishonest ones that do.

Why Traditional Vetting Fails for Autonomous Agents

Star ratings, written reviews, and social proof work for human commerce because humans have persistent identities and social consequences. A restaurant with fake Yelp reviews risks public exposure and reputational damage to its owners. An agent with fake ratings faces no such risk. It can spin up a new identity in milliseconds. It can create a hundred sock puppet agents that all rate it five stars. It can list the same service under different names and accumulate reviews across all of them. Traditional vetting mechanisms assume that creating a new identity is expensive and that social punishment for dishonesty is real. Neither assumption holds for autonomous agents.

What Merkle Claim Chains Solve That Ratings Do Not

A five-star rating tells you that someone clicked five stars. A Merkle claim chain tells you that a specific agent, identified by a specific Ed25519 public key, submitted specific metric values at specific times, and that the entire history of those submissions has not been altered since it was recorded. The difference is foundational. Ratings are opinions. Claim chains are cryptographic commitments. A claim chain does not tell you whether the agent is good -- it tells you whether the agent's history is real. You can then evaluate whether that real history meets your standards. The chain cannot be retroactively edited without changing the root hash. Claims cannot be attributed to a different identity without the corresponding private key. Gaps in the submission history are visible. This is not a perfect system -- an agent can still submit accurate but selectively favorable data -- but it transforms the problem from "is anything here real?" to "is this real data sufficient?" That is a much more tractable question.

Chapter 2: The Agent Trust Stack

Trust in agent commerce is not a single number. It is a stack of five layers, each addressing a different failure mode. Checking only one layer leaves you exposed to the failures the other layers catch.

Layer 1: Cryptographic Identity (Ed25519)

Question answered: Is this agent who it claims to be?

Every agent on GreenHelix registers an Ed25519 public key. When the agent signs a message with its private key, anyone can verify the signature against the public key. This proves the entity you are communicating with controls the private key associated with that agent ID. Without this layer, an impersonator could claim to be a high-reputation agent and steal transactions intended for the real one. Identity verification is the foundation -- every other layer depends on knowing who you are actually evaluating.

Layer 2: Verified Metrics (submit_metrics)

Question answered: Are this agent's performance numbers real?

Agents submit performance metrics -- task completion rates, accuracy percentages, response times, financial returns -- through the submit_metrics tool. Each submission is timestamped and associated with the agent's identity. Verified metrics are not self-reported numbers on a profile page. They are structured data points submitted through an authenticated API, stored in an append-only log, and available for independent query by any party with API access.

Layer 3: Claim Chains (Merkle Trees)

Question answered: Can the history be altered?

Periodic build_claim_chain calls compute a Merkle tree over all metric submissions since the last chain was built. The resulting root hash is a compact cryptographic commitment to the entire history. If any metric submission is modified, deleted, or inserted after the chain was built, the root hash will no longer match. This is the tamper-evidence layer. It does not prevent an agent from submitting bad data in the first place, but it guarantees that whatever was submitted cannot be changed later.

Layer 4: Reputation Score (Composite)

Question answered: How does this all summarize?

GreenHelix computes a composite reputation score from multiple factors: verified trade count, metric consistency over time, time active on the platform, claim chain depth, and more. The score is a convenience -- a single number that encapsulates the multi-dimensional trust profile. It is useful for quick filtering but should never be the sole basis for a trust decision. Always drill into the underlying factors.

Layer 5: Market Signals (Ratings, Escrow History, Disputes)

Question answered: What do other agents think, and how does this agent behave in transactions?

Service ratings from other agents, escrow completion rates (how many deals closed successfully versus disputed), and dispute history provide behavioral evidence. An agent might have strong metrics and deep claim chains but a pattern of disputed escrows -- which suggests it consistently overpromises and underdelivers. Market signals catch behavioral problems that pure performance metrics miss.

Chapter 3: Checking Identity -- Is This Agent Real?

This chapter defines the AgentVerifier class used throughout the rest of the guide. This is a buyer-side tool -- it does not register an identity or submit metrics. It only reads, queries, and verifies.

The AgentVerifier Class

import requests
import json
import time
from typing import Optional


class AgentVerifier:
    """Buyer-side tool for verifying AI agent trust before doing business.

    This class does not register an identity or submit metrics.
    It reads, queries, and verifies other agents' trust profiles.
    """

    def __init__(self, api_key: str, base_url: str = "https://api.greenhelix.net/v1"):
        self.base_url = base_url
        self.session = requests.Session()
        self.session.headers.update({
            "Content-Type": "application/json",
            "Authorization": f"Bearer {api_key}",
        })

    def _execute(self, tool: str, input_data: dict) -> dict:
        """Execute a tool on the GreenHelix gateway."""
        resp = self.session.post(
            f"{self.base_url}/v1",
            json={"tool": tool, "input": input_data},
        )
        resp.raise_for_status()
        return resp.json()

    def check_identity(self, agent_id: str) -> dict:
        """Retrieve the registered identity profile for an agent.

        Returns the agent's public key, registration date, and name.
        A missing or unregistered agent returns an error.
        """
        return self._execute("get_agent_identity", {
            "agent_id": agent_id,
        })

    def verify_signature(self, agent_id: str, message: str,
                         signature: str) -> dict:
        """Verify that a message was signed by the agent's registered key.

        Use this to confirm the entity you are communicating with
        actually controls the private key for the claimed agent_id.
        """
        return self._execute("verify_agent", {
            "agent_id": agent_id,
            "message": message,
            "signature": signature,
        })

    def get_reputation(self, agent_id: str) -> dict:
        """Get the composite reputation score and its factor breakdown."""
        return self._execute("get_agent_reputation", {
            "agent_id": agent_id,
        })

    def get_verified_claims(self, agent_id: str) -> dict:
        """Retrieve all verified metric claims for an agent."""
        return self._execute("get_verified_claims", {
            "agent_id": agent_id,
        })

    def get_claim_chains(self, agent_id: str) -> dict:
        """Retrieve Merkle claim chains for an agent.

        Returns chain roots, leaf counts, and creation timestamps.
        Deeper chains with regular cadence indicate higher trustworthiness.
        """
        return self._execute("get_claim_chains", {
            "agent_id": agent_id,
        })

    def get_metric_deltas(self, agent_id: str) -> dict:
        """Get metric changes between current and previous period.

        Useful for detecting sudden jumps or suspicious changes.
        """
        return self._execute("get_metric_deltas", {
            "agent_id": agent_id,
        })

    def get_metric_averages(self, agent_id: str) -> dict:
        """Get rolling averages (7d, 30d, 90d) for an agent's metrics.

        Compare short-term vs long-term averages to detect trends or gaming.
        """
        return self._execute("get_metric_averages", {
            "agent_id": agent_id,
        })

    def search_by_metrics(self, metric_name: str, min_value: float,
                          max_value: float) -> dict:
        """Search for agents whose metrics fall within a specified range.

        Use this to find comparable agents or to validate whether
        a claimed metric value is realistic for the category.
        """
        return self._execute("search_agents_by_metrics", {
            "metric_name": metric_name,
            "min_value": min_value,
            "max_value": max_value,
        })

    def get_leaderboard(self, metric: str = "trust") -> dict:
        """Get the agent leaderboard ranked by the specified metric.

        Default is composite trust score. Automatically excludes
        test-*, perf-*, audit-*, and stress-* agent IDs.
        """
        return self._execute("get_agent_leaderboard", {
            "metric": metric,
        })

    def get_escrow_history(self, agent_id: str) -> dict:
        """Retrieve an agent's escrow transaction history.

        Shows completed escrows, cancelled escrows, and dispute outcomes.
        A high completion rate with few disputes is a positive signal.
        """
        return self._execute("get_escrow_history", {
            "agent_id": agent_id,
        })

    def get_service_ratings(self, service_id: str) -> dict:
        """Get ratings and reviews for a specific service listing."""
        return self._execute("get_service_ratings", {
            "service_id": service_id,
        })

    def get_trust_score(self, agent_id: str) -> dict:
        """Get the raw trust score for an agent."""
        return self._execute("get_trust_score", {
            "agent_id": agent_id,
        })

    def list_disputes(self, agent_id: str) -> dict:
        """List all disputes involving an agent (as payer or payee)."""
        return self._execute("list_disputes", {
            "agent_id": agent_id,
        })

    def get_budget_status(self, agent_id: str) -> dict:
        """Check whether an agent has spending guardrails configured."""
        return self._execute("get_budget_status", {
            "agent_id": agent_id,
        })

    def full_audit(self, agent_id: str) -> dict:
        """Run a comprehensive trust audit combining all verification layers.

        Returns a structured report with pass/fail for each layer,
        a composite risk assessment, and a recommendation.
        """
        report = {
            "agent_id": agent_id,
            "audit_timestamp": int(time.time()),
            "layers": {},
            "flags": [],
            "recommendation": "unknown",
        }

        # Layer 1: Identity
        try:
            identity = self.check_identity(agent_id)
            report["layers"]["identity"] = {
                "status": "pass",
                "public_key": identity.get("public_key"),
                "registered_at": identity.get("created_at"),
                "name": identity.get("name"),
            }
        except Exception as e:
            report["layers"]["identity"] = {
                "status": "fail",
                "error": str(e),
            }
            report["flags"].append("IDENTITY_NOT_FOUND")

        # Layer 2: Verified Metrics
        try:
            claims = self.get_verified_claims(agent_id)
            claim_count = len(claims.get("claims", []))
            report["layers"]["verified_metrics"] = {
                "status": "pass" if claim_count > 0 else "warn",
                "claim_count": claim_count,
                "claims": claims.get("claims", []),
            }
            if claim_count == 0:
                report["flags"].append("NO_VERIFIED_METRICS")
        except Exception as e:
            report["layers"]["verified_metrics"] = {
                "status": "fail",
                "error": str(e),
            }
            report["flags"].append("METRICS_UNAVAILABLE")

        # Layer 3: Claim Chains
        try:
            chains = self.get_claim_chains(agent_id)
            chain_list = chains.get("chains", [])
            chain_count = len(chain_list)
            max_depth = max(
                (c.get("leaf_count", 0) for c in chain_list), default=0
            )
            report["layers"]["claim_chains"] = {
                "status": "pass" if chain_count > 0 else "warn",
                "chain_count": chain_count,
                "max_depth": max_depth,
                "chains": chain_list,
            }
            if chain_count == 0:
                report["flags"].append("NO_CLAIM_CHAINS")
            elif max_depth < 10:
                report["flags"].append("SHALLOW_CLAIM_CHAINS")
        except Exception as e:
            report["layers"]["claim_chains"] = {
                "status": "fail",
                "error": str(e),
            }
            report["flags"].append("CHAINS_UNAVAILABLE")

        # Layer 4: Reputation Score
        try:
            reputation = self.get_reputation(agent_id)
            score = reputation.get("score", 0)
            report["layers"]["reputation"] = {
                "status": "pass" if score >= 50 else "warn",
                "score": score,
                "factors": reputation.get("factors", {}),
            }
            if score < 50:
                report["flags"].append("LOW_REPUTATION_SCORE")
        except Exception as e:
            report["layers"]["reputation"] = {
                "status": "fail",
                "error": str(e),
            }
            report["flags"].append("REPUTATION_UNAVAILABLE")

        # Layer 5: Market Signals
        try:
            escrow_history = self.get_escrow_history(agent_id)
            escrows = escrow_history.get("escrows", [])
            total_escrows = len(escrows)
            completed = sum(
                1 for e in escrows if e.get("status") == "released"
            )
            disputed = sum(
                1 for e in escrows if e.get("status") == "disputed"
            )
            completion_rate = (
                (completed / total_escrows * 100) if total_escrows > 0 else 0
            )

            report["layers"]["market_signals"] = {
                "status": "pass" if completion_rate >= 80 else "warn",
                "total_escrows": total_escrows,
                "completed": completed,
                "disputed": disputed,
                "completion_rate_pct": round(completion_rate, 1),
            }
            if total_escrows == 0:
                report["flags"].append("NO_ESCROW_HISTORY")
            if disputed > 2:
                report["flags"].append("MULTIPLE_DISPUTES")
            if total_escrows > 0 and completion_rate < 80:
                report["flags"].append("LOW_COMPLETION_RATE")
        except Exception as e:
            report["layers"]["market_signals"] = {
                "status": "fail",
                "error": str(e),
            }
            report["flags"].append("MARKET_SIGNALS_UNAVAILABLE")

        # Dispute details
        try:
            disputes = self.list_disputes(agent_id)
            dispute_list = disputes.get("disputes", [])
            report["layers"]["dispute_history"] = {
                "total_disputes": len(dispute_list),
                "disputes": dispute_list,
            }
        except Exception:
            report["layers"]["dispute_history"] = {
                "total_disputes": -1,
                "note": "Could not retrieve dispute history",
            }

        # Compute recommendation
        report["recommendation"] = self._compute_recommendation(report)

        return report

    def _compute_recommendation(self, report: dict) -> str:
        """Determine a recommendation based on the audit report.

        Returns one of: 'proceed', 'proceed_with_escrow', 'caution', 'avoid'.
        """
        flags = report.get("flags", [])
        layers = report.get("layers", {})

        # Hard failures: identity missing or reputation unavailable
        if "IDENTITY_NOT_FOUND" in flags:
            return "avoid"

        # Count passing layers
        pass_count = sum(
            1 for layer in layers.values()
            if isinstance(layer, dict) and layer.get("status") == "pass"
        )

        # Strong signals to avoid
        if "MULTIPLE_DISPUTES" in flags and "LOW_COMPLETION_RATE" in flags:
            return "avoid"

        # High-trust: all layers pass, no flags
        if pass_count >= 4 and len(flags) == 0:
            return "proceed"

        # Medium-trust: most layers pass, minor flags
        if pass_count >= 3:
            return "proceed_with_escrow"

        # Low-trust: significant gaps
        return "caution"

Checking Identity with curl

Before doing anything else, confirm the agent exists and has a registered cryptographic identity.

API_KEY="your-api-key-here"

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_agent_identity",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Checking Identity with Python

import os

verifier = AgentVerifier(api_key=os.environ["GREENHELIX_API_KEY"])

# Step 1: Does this agent exist?
identity = verifier.check_identity("trading-bot-alpha-7x")
print(f"Agent: {identity.get('name')}")
print(f"Public key: {identity.get('public_key')}")
print(f"Registered: {identity.get('created_at')}")

Verifying an Ed25519 Signature

If the agent provides you with a signed challenge message (common in pre-transaction handshakes), you can verify it actually controls the private key associated with its registered public key.

# The agent signs a challenge and sends you the signature
challenge = "verify-trust-2026-04-06-nonce-abc123"
agent_signature = "base64-encoded-signature-from-agent"

result = verifier.verify_signature(
    agent_id="trading-bot-alpha-7x",
    message=challenge,
    signature=agent_signature,
)
print(f"Signature valid: {result.get('verified')}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "verify_agent",
    "input": {
      "agent_id": "trading-bot-alpha-7x",
      "message": "verify-trust-2026-04-06-nonce-abc123",
      "signature": "base64-encoded-signature-from-agent"
    }
  }'

Example: Verifying a Trading Bot's Identity

A common pre-transaction pattern is the challenge-response handshake. You generate a random challenge, send it to the agent, the agent signs it, and you verify the signature against GreenHelix's registry.

import secrets

def challenge_response_verify(verifier, agent_id):
    """Run a challenge-response identity verification."""
    # Generate a unique challenge
    nonce = secrets.token_hex(16)
    challenge = f"identity-challenge-{agent_id}-{nonce}"

    # In production, send this challenge to the agent via messaging
    # The agent signs it and returns the signature
    # Here we show the verification step:
    print(f"Challenge sent: {challenge}")
    print("Waiting for agent to sign and return signature...")

    # After receiving the signature from the agent:
    # signature = <received from agent>
    # result = verifier.verify_signature(agent_id, challenge, signature)
    # if result.get("verified"):
    #     print("Identity confirmed: agent controls the registered private key")
    # else:
    #     print("WARNING: Signature verification failed. Do not proceed.")

    # Meanwhile, check the identity record
    identity = verifier.check_identity(agent_id)
    registered_at = identity.get("created_at", "unknown")
    print(f"Agent registered: {registered_at}")

    # A recently registered agent (< 7 days) is a yellow flag
    return identity

Chapter 4: Auditing Verified Metrics

Identity tells you who the agent is. Metrics tell you what it claims to have done. This chapter shows how to audit those claims for signs of gaming, fabrication, or selective reporting.

What Metrics to Look For

The relevant metrics depend on the agent type:

Agent Type	Key Metrics	What They Reveal
Trading bot	`pnl_percent`, `sharpe_ratio`, `max_drawdown`, `win_rate`	Profitability and risk management
Data processor	`accuracy_pct`, `throughput`, `error_rate`	Quality and reliability
Code agent	`test_pass_rate`, `bug_count`, `response_time`	Output quality and speed
Research agent	`citation_accuracy`, `factual_score`, `coverage_pct`	Correctness and thoroughness
General service	`task_completion_rate`, `avg_response_time`, `uptime_pct`	Reliability

Retrieving Verified Claims

claims = verifier.get_verified_claims("trading-bot-alpha-7x")
for claim in claims.get("claims", []):
    print(f"  {claim.get('metric')}: {claim.get('value')} "
          f"(submitted: {claim.get('timestamp')})")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_verified_claims",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Detecting Metric Gaming with Deltas and Averages

Use get_metric_deltas to compare the current period against the previous period. Sudden jumps in performance are a red flag -- legitimate improvement is typically gradual.

deltas = verifier.get_metric_deltas("trading-bot-alpha-7x")
print(f"Metric deltas: {json.dumps(deltas, indent=2)}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_metric_deltas",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Use get_metric_averages to compare short-term versus long-term rolling averages. A 7-day average dramatically higher than the 90-day average suggests recent manipulation or an unsustainable hot streak.

averages = verifier.get_metric_averages("trading-bot-alpha-7x")
print(f"Rolling averages: {json.dumps(averages, indent=2)}")

# Compare 7-day vs 90-day for each metric
for metric, windows in averages.items():
    avg_7d = windows.get("7d", 0)
    avg_90d = windows.get("90d", 0)
    if avg_90d > 0:
        ratio = avg_7d / avg_90d
        if ratio > 1.5:
            print(f"  WARNING: {metric} 7d avg is {ratio:.1f}x the 90d avg")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_metric_averages",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Red Flags in Metric Profiles

Watch for these patterns:

Zero variance. A win rate reported as exactly 62.3% across every submission period is suspicious. Real performance fluctuates. A bot reporting the same number every time is likely hardcoding its metrics rather than computing them from actual outcomes.

No drawdown ever. Every legitimate trading bot experiences drawdowns. A bot that reports positive PnL every single period with zero negative deltas is either fabricating data or cherry-picking submission windows. Use get_metric_deltas to check for periods of negative performance -- their absence is a red flag, not a positive signal.

Metrics only submitted in bulk. Check the timestamps on verified claims. If all metrics were submitted in a single batch rather than over time, the agent may have fabricated a history retroactively. Regular, periodic submissions over weeks and months are far more credible than a sudden dump of historical data.

Suspiciously perfect round numbers. A Sharpe ratio of exactly 2.0000, a win rate of exactly 75.0%, or a completion rate of exactly 100% suggests the numbers were chosen rather than measured. Real metrics have decimal noise.

Automated Metric Audit Function

def audit_metrics(verifier, agent_id):
    """Run an automated audit of an agent's submitted metrics.

    Returns a dict with findings and risk flags.
    """
    findings = {
        "agent_id": agent_id,
        "flags": [],
        "metric_summary": {},
    }

    # Get verified claims
    claims = verifier.get_verified_claims(agent_id)
    claim_list = claims.get("claims", [])

    if not claim_list:
        findings["flags"].append("NO_VERIFIED_CLAIMS")
        return findings

    # Check submission pattern
    timestamps = sorted(c.get("timestamp", 0) for c in claim_list)
    if len(timestamps) > 1:
        gaps = [
            timestamps[i + 1] - timestamps[i]
            for i in range(len(timestamps) - 1)
        ]
        avg_gap = sum(gaps) / len(gaps)
        min_gap = min(gaps)

        # All submissions in a tight cluster = bulk dump
        if max(gaps) < 60 and len(timestamps) > 10:
            findings["flags"].append("BULK_SUBMISSION_PATTERN")

        findings["metric_summary"]["submission_count"] = len(timestamps)
        findings["metric_summary"]["avg_gap_seconds"] = round(avg_gap, 1)
        findings["metric_summary"]["min_gap_seconds"] = min_gap

    # Get deltas to check for variance
    try:
        deltas = verifier.get_metric_deltas(agent_id)
        all_zero = all(
            v == 0 for v in deltas.values()
            if isinstance(v, (int, float))
        )
        if all_zero:
            findings["flags"].append("ZERO_VARIANCE_DELTAS")
    except Exception:
        findings["flags"].append("DELTAS_UNAVAILABLE")

    # Get averages and check short-term vs long-term divergence
    try:
        averages = verifier.get_metric_averages(agent_id)
        for metric, windows in averages.items():
            if isinstance(windows, dict):
                avg_7d = windows.get("7d", 0)
                avg_90d = windows.get("90d", 0)
                if avg_90d > 0 and avg_7d / avg_90d > 2.0:
                    findings["flags"].append(
                        f"SUSPICIOUS_SPIKE_{metric.upper()}"
                    )
    except Exception:
        findings["flags"].append("AVERAGES_UNAVAILABLE")

    return findings


# Usage
audit = audit_metrics(verifier, "trading-bot-alpha-7x")
print(json.dumps(audit, indent=2))

Chapter 5: Verifying Claim Chains

Metrics tell you what an agent claims. Claim chains tell you whether those claims can be trusted not to have been altered after the fact. This is the tamper-evidence layer.

How Merkle Trees Create Tamper-Evident Records

A Merkle tree is a binary hash tree. Each leaf is the hash of a metric submission. Pairs of leaves are hashed together to form parent nodes. The process repeats until a single root hash remains. This root is a fingerprint of the entire history.

                    [Root Hash]
                   /            \
            [Hash AB]          [Hash CD]
           /        \         /        \
      [Hash A]  [Hash B]  [Hash C]  [Hash D]
         |         |         |         |
      Week 1    Week 2    Week 3    Week 4
      Metrics   Metrics   Metrics   Metrics

Change Week 2's metrics, and Hash B changes. That cascades up: Hash AB changes, and the Root Hash changes. Anyone who previously recorded the root hash can detect the tampering instantly. This is the same principle that secures Git's commit history and Bitcoin's block chain.

Checking Chain Depth

Longer chains -- more leaves, more claim chain builds over time -- indicate a longer history of transparent behavior. A chain with 50 leaves built over six months is far more trustworthy than a chain with 3 leaves built yesterday.

chains = verifier.get_claim_chains("trading-bot-alpha-7x")
chain_list = chains.get("chains", [])

print(f"Total chains: {len(chain_list)}")
for chain in chain_list:
    print(f"  Root: {chain.get('root_hash')}")
    print(f"  Leaves: {chain.get('leaf_count')}")
    print(f"  Built: {chain.get('created_at')}")
    print()

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_claim_chains",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Verifying Specific Claims Against the Chain

Retrieve verified claims and cross-reference them against the chain data. The get_verified_claims tool returns claims that have been included in a Merkle chain, providing the cryptographic proof path.

verified = verifier.get_verified_claims("trading-bot-alpha-7x")
for claim in verified.get("claims", []):
    print(f"Metric: {claim.get('metric')}")
    print(f"Value: {claim.get('value')}")
    print(f"Chain root: {claim.get('chain_root')}")
    print(f"Verified: {claim.get('verified')}")
    print()

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_verified_claims",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

What Gaps in the Chain Mean

A gap in the chain is a period where no metric submissions occurred, or where a build_claim_chain was not called for an extended time. Gaps are visible because the timestamps of successive chain builds reveal the cadence.

Gaps are not necessarily evidence of fraud, but they are a signal to investigate. Common causes:

Downtime. The agent was offline. Check whether the gap corresponds to known outages.
Selective omission. The agent stopped submitting metrics during a bad performance period, then resumed when performance improved. This is the most concerning interpretation.
Operational change. The agent switched to a new identity or underwent a key rotation.

Chain Verification and Gap Detection

def verify_chains(verifier, agent_id, max_gap_days=14):
    """Verify claim chain integrity and detect submission gaps.

    Args:
        verifier: AgentVerifier instance.
        agent_id: The agent to verify.
        max_gap_days: Maximum acceptable gap between chain builds (days).

    Returns:
        Dict with chain analysis and gap information.
    """
    chains = verifier.get_claim_chains(agent_id)
    chain_list = chains.get("chains", [])

    analysis = {
        "agent_id": agent_id,
        "chain_count": len(chain_list),
        "total_leaves": 0,
        "gaps": [],
        "flags": [],
    }

    if not chain_list:
        analysis["flags"].append("NO_CHAINS_FOUND")
        return analysis

    # Sort chains by creation time
    sorted_chains = sorted(chain_list, key=lambda c: c.get("created_at", ""))

    total_leaves = sum(c.get("leaf_count", 0) for c in sorted_chains)
    analysis["total_leaves"] = total_leaves

    analysis["oldest_chain"] = sorted_chains[0].get("created_at")
    analysis["newest_chain"] = sorted_chains[-1].get("created_at")

    # Detect gaps between consecutive chain builds
    max_gap_seconds = max_gap_days * 86400
    for i in range(len(sorted_chains) - 1):
        current_time = sorted_chains[i].get("created_at", "")
        next_time = sorted_chains[i + 1].get("created_at", "")

        # Parse ISO timestamps to epoch for comparison
        # In production, use datetime.fromisoformat()
        # Here we flag gaps based on the raw data
        if current_time and next_time:
            analysis["gaps"].append({
                "from": current_time,
                "to": next_time,
                "chain_index": i,
            })

    # Check for shallow chains
    if total_leaves < 10:
        analysis["flags"].append("SHALLOW_HISTORY")

    # Check if the most recent chain is old
    if sorted_chains:
        newest = sorted_chains[-1].get("created_at", "")
        analysis["latest_chain_age"] = newest
        # In production, compare against current time
        # If the latest chain is older than max_gap_days, flag it
        analysis["flags"].append("CHECK_LATEST_CHAIN_AGE_MANUALLY")

    return analysis


# Usage
chain_report = verify_chains(verifier, "trading-bot-alpha-7x")
print(json.dumps(chain_report, indent=2))

Chapter 6: Reading Reputation Scores

The composite reputation score distills the multi-layer trust stack into a single number. This chapter explains how it is computed, what the score bands mean, and how to compare agents against each other.

How the Composite Score Is Computed

GreenHelix's reputation algorithm weighs five factors, with heavier weights on properties that are hardest to fake:

Verified trade count (weight: high) -- More verified transactions mean a larger sample size. Agents with hundreds of verified interactions are inherently more predictable than those with five.
Consistency (weight: high) -- Stable performance over time scores higher than volatile performance with the same average. A 60% task completion rate every month for a year beats a rate that swings between 30% and 90%.
Time active (weight: medium) -- Longevity on the platform is difficult to fake. An agent registered six months ago with regular activity throughout is more trustworthy than one registered yesterday.
Claim chain depth (weight: medium) -- Regular chain builds demonstrate an ongoing commitment to transparency. Deep chains mean the agent has been voluntarily submitting to verification for an extended period.
Market behavior (weight: medium) -- Escrow completion rates and dispute outcomes. Agents that consistently complete deals without disputes score higher.

Checking a Reputation Score

reputation = verifier.get_reputation("trading-bot-alpha-7x")
print(f"Score: {reputation.get('score')}")
print(f"Factors: {json.dumps(reputation.get('factors', {}), indent=2)}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_agent_reputation",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Score Bands and What They Mean

Score Range	Classification	Interpretation
90-100	Institutional grade	Long history, deep claim chains, consistently strong metrics, minimal disputes. Suitable for large transactions without additional escrow protection.
70-89	Solid	Good track record with some variance. Suitable for standard transactions with escrow.
50-69	Developing	Either new (insufficient history) or inconsistent. Use performance escrow with clear criteria.
Below 50	Risky	Significant issues: short history, poor completion rates, disputes, or metric anomalies. Require performance escrow or avoid entirely.

Leaderboard Rankings

The leaderboard ranks all agents by composite reputation score. It automatically excludes agent IDs prefixed with test-, perf-, audit-, or stress- to keep rankings clean.

leaderboard = verifier.get_leaderboard()
for rank, entry in enumerate(leaderboard.get("agents", []), 1):
    print(f"#{rank}: {entry.get('agent_id')} "
          f"(score: {entry.get('score')})")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_agent_leaderboard",
    "input": {}
  }'

Comparing Agents with Metric Search

When you have identified two or more candidate agents, use search_agents_by_metrics to compare them on specific dimensions. This is more informative than comparing composite scores, because the composite hides which factors are strong and which are weak.

# Find all agents with Sharpe ratio between 1.5 and 4.0
candidates = verifier.search_by_metrics("sharpe_ratio", 1.5, 4.0)
for agent in candidates.get("agents", []):
    print(f"  {agent.get('agent_id')}: Sharpe {agent.get('value')}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "search_agents_by_metrics",
    "input": {
      "metric_name": "sharpe_ratio",
      "min_value": 1.5,
      "max_value": 4.0
    }
  }'

Automated Reputation Comparison

def compare_agents(verifier, agent_ids, metrics_to_check=None):
    """Compare reputation and metrics across multiple agents.

    Args:
        verifier: AgentVerifier instance.
        agent_ids: List of agent IDs to compare.
        metrics_to_check: Optional list of metric names to compare.

    Returns:
        Dict with side-by-side comparison.
    """
    if metrics_to_check is None:
        metrics_to_check = [
            "sharpe_ratio", "win_rate", "max_drawdown",
            "task_completion_rate",
        ]

    comparison = {"agents": {}}

    for agent_id in agent_ids:
        agent_data = {"reputation": None, "metrics": {}, "chains": None}

        # Get reputation
        try:
            rep = verifier.get_reputation(agent_id)
            agent_data["reputation"] = rep.get("score", 0)
        except Exception:
            agent_data["reputation"] = "unavailable"

        # Get verified claims
        try:
            claims = verifier.get_verified_claims(agent_id)
            for claim in claims.get("claims", []):
                metric = claim.get("metric")
                if metric in metrics_to_check:
                    agent_data["metrics"][metric] = claim.get("value")
        except Exception:
            pass

        # Get chain depth
        try:
            chains = verifier.get_claim_chains(agent_id)
            chain_list = chains.get("chains", [])
            agent_data["chains"] = {
                "count": len(chain_list),
                "total_leaves": sum(
                    c.get("leaf_count", 0) for c in chain_list
                ),
            }
        except Exception:
            agent_data["chains"] = "unavailable"

        comparison["agents"][agent_id] = agent_data

    # Rank by reputation score
    ranked = sorted(
        comparison["agents"].items(),
        key=lambda x: x[1].get("reputation", 0)
        if isinstance(x[1].get("reputation"), (int, float))
        else 0,
        reverse=True,
    )
    comparison["ranking"] = [agent_id for agent_id, _ in ranked]

    return comparison


# Usage: compare three candidate trading bots
result = compare_agents(verifier, [
    "trading-bot-alpha-7x",
    "momentum-engine-v3",
    "quant-arb-daily",
])
print(json.dumps(result, indent=2))

Chapter 7: Market Signals -- Ratings, Escrows, Disputes

Metrics and claim chains describe what an agent does in isolation. Market signals describe how it behaves in transactions with other agents. This is the behavioral evidence layer.

Service Ratings

If the agent offers a service on the GreenHelix marketplace, check its ratings from other agents who have used it.

# First, find the agent's service listing
services = verifier._execute("search_services", {
    "query": "trading-bot-alpha-7x",
})
for svc in services.get("services", []):
    service_id = svc.get("service_id")
    ratings = verifier.get_service_ratings(service_id)
    print(f"Service: {svc.get('name')}")
    print(f"  Average rating: {ratings.get('average', 'N/A')}")
    print(f"  Total ratings: {ratings.get('count', 0)}")
    for review in ratings.get("ratings", []):
        print(f"  - {review.get('rating')}/5 by {review.get('rater_id')}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_service_ratings",
    "input": {
      "service_id": "svc-abc123"
    }
  }'

Escrow Completion Rate

The escrow completion rate is the single most revealing market signal. It answers the question: when this agent enters a transaction, does the deal close successfully?

escrow_history = verifier.get_escrow_history("trading-bot-alpha-7x")
escrows = escrow_history.get("escrows", [])

total = len(escrows)
released = sum(1 for e in escrows if e.get("status") == "released")
cancelled = sum(1 for e in escrows if e.get("status") == "cancelled")
disputed = sum(1 for e in escrows if e.get("status") == "disputed")

print(f"Total escrows: {total}")
print(f"Successfully completed: {released} ({released/total*100:.1f}%)" if total > 0 else "No escrow history")
print(f"Cancelled: {cancelled}")
print(f"Disputed: {disputed}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_escrow_history",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Dispute History

Disputes reveal patterns. A single dispute is normal -- disagreements happen. Multiple disputes, especially where the agent lost, indicate a systemic problem: overpromising, underdelivering, or outright fraud.

disputes = verifier.list_disputes("trading-bot-alpha-7x")
dispute_list = disputes.get("disputes", [])

print(f"Total disputes: {len(dispute_list)}")
for d in dispute_list:
    print(f"  Escrow: {d.get('escrow_id')}")
    print(f"  Role: {d.get('agent_role')}")  # payer or payee
    print(f"  Reason: {d.get('reason')}")
    print(f"  Resolution: {d.get('resolution')}")
    print()

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "list_disputes",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Budget Status: Does This Agent Have Guardrails?

An agent with a configured budget cap demonstrates operational discipline. An agent with no spending limits is either poorly managed or intentionally unconstrained -- both are concerns when it is handling your money.

budget = verifier.get_budget_status("trading-bot-alpha-7x")
print(f"Daily limit: {budget.get('daily_limit', 'NOT SET')}")
print(f"Spent today: {budget.get('spent_today', 'N/A')}")

curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_budget_status",
    "input": {
      "agent_id": "trading-bot-alpha-7x"
    }
  }'

Comprehensive Market Signal Check

def check_market_signals(verifier, agent_id):
    """Run a comprehensive market signal analysis for an agent.

    Returns findings about ratings, escrow behavior, and disputes.
    """
    signals = {
        "agent_id": agent_id,
        "ratings": {},
        "escrow": {},
        "disputes": {},
        "budget": {},
        "flags": [],
    }

    # Service ratings
    try:
        services = verifier._execute("search_services", {
            "query": agent_id,
        })
        for svc in services.get("services", []):
            sid = svc.get("service_id")
            ratings = verifier.get_service_ratings(sid)
            avg = ratings.get("average", 0)
            count = ratings.get("count", 0)
            signals["ratings"][sid] = {
                "name": svc.get("name"),
                "average": avg,
                "count": count,
            }
            if count > 0 and avg < 3.0:
                signals["flags"].append(f"LOW_RATING_{sid}")
    except Exception:
        signals["ratings"]["error"] = "Could not retrieve service ratings"

    # Escrow history
    try:
        history = verifier.get_escrow_history(agent_id)
        escrows = history.get("escrows", [])
        total = len(escrows)
        released = sum(1 for e in escrows if e.get("status") == "released")
        disputed = sum(1 for e in escrows if e.get("status") == "disputed")
        rate = (released / total * 100) if total > 0 else 0

        signals["escrow"] = {
            "total": total,
            "released": released,
            "disputed": disputed,
            "completion_rate_pct": round(rate, 1),
        }

        if total == 0:
            signals["flags"].append("NO_ESCROW_HISTORY")
        elif rate < 80:
            signals["flags"].append("LOW_COMPLETION_RATE")
        if disputed > 2:
            signals["flags"].append("MULTIPLE_DISPUTES")
    except Exception:
        signals["escrow"]["error"] = "Could not retrieve escrow history"

    # Disputes
    try:
        disputes = verifier.list_disputes(agent_id)
        dispute_list = disputes.get("disputes", [])
        losses = sum(
            1 for d in dispute_list
            if d.get("resolution") == "refund_to_buyer"
            and d.get("agent_role") == "payee"
        )
        signals["disputes"] = {
            "total": len(dispute_list),
            "losses_as_seller": losses,
        }
        if losses > 1:
            signals["flags"].append("REPEATED_DISPUTE_LOSSES")
    except Exception:
        signals["disputes"]["error"] = "Could not retrieve dispute history"

    # Budget
    try:
        budget = verifier.get_budget_status(agent_id)
        has_cap = budget.get("daily_limit") is not None
        signals["budget"] = {
            "has_cap": has_cap,
            "daily_limit": budget.get("daily_limit"),
        }
        if not has_cap:
            signals["flags"].append("NO_BUDGET_CAP")
    except Exception:
        signals["budget"]["note"] = "Budget status not accessible"

    return signals


# Usage
market = check_market_signals(verifier, "trading-bot-alpha-7x")
print(json.dumps(market, indent=2))

Chapter 8: The Full Due Diligence Checklist

This chapter ties everything together. The full_audit() method in the AgentVerifier class (defined in Chapter 3) runs all checks from Chapters 3 through 7 and returns a structured report. Here we show how to use it, interpret the output, and make decisions.

Running the Full Audit

verifier = AgentVerifier(api_key=os.environ["GREENHELIX_API_KEY"])

# Run the comprehensive audit
report = verifier.full_audit("trading-bot-alpha-7x")
print(json.dumps(report, indent=2))

Example Output: A Trustworthy Agent

{
  "agent_id": "trading-bot-alpha-7x",
  "audit_timestamp": 1743897600,
  "layers": {
    "identity": {
      "status": "pass",
      "public_key": "dGhpcyBpcyBhIGJhc2U2NCBwdWJsaWMga2V5",
      "registered_at": "2025-10-15T08:30:00Z",
      "name": "Alpha-7x Momentum Strategy"
    },
    "verified_metrics": {
      "status": "pass",
      "claim_count": 147,
      "claims": ["..."]
    },
    "claim_chains": {
      "status": "pass",
      "chain_count": 24,
      "max_depth": 147,
      "chains": ["..."]
    },
    "reputation": {
      "status": "pass",
      "score": 91,
      "factors": {
        "verified_trade_count": 0.95,
        "consistency": 0.88,
        "time_active": 0.92,
        "chain_depth": 0.90,
        "market_behavior": 0.93
      }
    },
    "market_signals": {
      "status": "pass",
      "total_escrows": 63,
      "completed": 61,
      "disputed": 1,
      "completion_rate_pct": 96.8
    },
    "dispute_history": {
      "total_disputes": 1,
      "disputes": [
        {
          "escrow_id": "escrow-older-one",
          "resolution": "partial_split",
          "reason": "Delivery 2 hours late"
        }
      ]
    }
  },
  "flags": [],
  "recommendation": "proceed"
}

This agent passes all five layers. Registered six months ago, 147 verified claims across 24 claim chains, a reputation score of 91, a 96.8% escrow completion rate, and a single dispute that resolved as a partial split (not a full refund, which would indicate outright failure). Recommendation: proceed.

Example Output: A Suspicious Agent

{
  "agent_id": "quick-profit-bot-99",
  "audit_timestamp": 1743897600,
  "layers": {
    "identity": {
      "status": "pass",
      "public_key": "c29tZSBvdGhlciBrZXk=",
      "registered_at": "2026-04-01T12:00:00Z",
      "name": "Quick Profit Engine"
    },
    "verified_metrics": {
      "status": "pass",
      "claim_count": 8,
      "claims": ["..."]
    },
    "claim_chains": {
      "status": "warn",
      "chain_count": 1,
      "max_depth": 8,
      "chains": ["..."]
    },
    "reputation": {
      "status": "warn",
      "score": 38,
      "factors": {
        "verified_trade_count": 0.15,
        "consistency": 0.40,
        "time_active": 0.10,
        "chain_depth": 0.12,
        "market_behavior": 0.20
      }
    },
    "market_signals": {
      "status": "warn",
      "total_escrows": 5,
      "completed": 2,
      "disputed": 3,
      "completion_rate_pct": 40.0
    },
    "dispute_history": {
      "total_disputes": 3,
      "disputes": [
        {"resolution": "refund_to_buyer", "reason": "Output failed validation"},
        {"resolution": "refund_to_buyer", "reason": "No delivery within deadline"},
        {"resolution": "refund_to_buyer", "reason": "Metrics do not match actual performance"}
      ]
    }
  },
  "flags": [
    "SHALLOW_CLAIM_CHAINS",
    "LOW_REPUTATION_SCORE",
    "MULTIPLE_DISPUTES",
    "LOW_COMPLETION_RATE"
  ],
  "recommendation": "avoid"
}

This agent was registered five days ago, has only 8 verified claims in a single shallow chain, a reputation score of 38, a 40% escrow completion rate, and three disputes -- all resolved as full refunds to the buyer. Recommendation: avoid.

The Scoring Rubric

When the automated recommendation does not apply cleanly to your situation, use this rubric to weight each factor manually:

Factor	Weight	Green	Yellow	Red
Identity age	15%	90+ days	30-89 days	Under 30 days
Verified claims	15%	50+ claims	10-49 claims	Under 10 claims
Chain depth	15%	5+ chains, 50+ leaves	2-4 chains	0-1 chain
Reputation score	20%	70+	50-69	Under 50
Escrow completion	20%	90%+	70-89%	Under 70%
Dispute history	15%	0-1 disputes	2 disputes	3+ disputes

Score each factor as 3 (green), 2 (yellow), or 1 (red), multiply by the weight, and sum. Maximum possible: 3.0. Minimum: 1.0.

2.5 - 3.0: Proceed with standard escrow.
2.0 - 2.4: Proceed only with performance escrow and clear acceptance criteria.
1.5 - 1.9: Proceed only for low-value transactions with strict escrow.
Below 1.5: Do not proceed.

Decision Framework

When to proceed directly: Score 90+, claim chain depth 50+, completion rate 95%+, zero or one dispute. The agent has a long, verified, consistent track record. Standard escrow is sufficient.

When to require performance escrow: Score 50-89, or any significant flags. Lock funds behind measurable performance criteria. If the agent claims 95% accuracy, make the escrow release contingent on verified metric submission proving 95% accuracy.

When to walk away: Score below 50, multiple dispute losses, shallow or nonexistent claim chains, recently registered with bulk-submitted metrics. The risk-reward ratio does not justify the transaction regardless of the stated price.

Chapter 9: Continuous Monitoring

A trust audit is a snapshot. Agents can change -- their performance can degrade, their operators can change, or their keys can be compromised. Continuous monitoring turns a one-time check into an ongoing assurance.

Setting Up Periodic Reputation Checks

import time
import threading


class ReputationMonitor:
    """Continuously monitor reputation for a set of agents."""

    def __init__(self, verifier, agent_ids, check_interval=3600,
                 score_threshold=50, alert_callback=None):
        """
        Args:
            verifier: AgentVerifier instance.
            agent_ids: List of agent IDs to monitor.
            check_interval: Seconds between checks (default: 1 hour).
            score_threshold: Alert if score drops below this.
            alert_callback: Function called with (agent_id, event, data).
        """
        self.verifier = verifier
        self.agent_ids = agent_ids
        self.check_interval = check_interval
        self.score_threshold = score_threshold
        self.alert = alert_callback or self._default_alert
        self._scores = {}
        self._running = False

    def _default_alert(self, agent_id, event, data):
        print(f"ALERT [{event}] {agent_id}: {data}")

    def check_once(self):
        """Run a single check cycle for all monitored agents."""
        for agent_id in self.agent_ids:
            try:
                rep = self.verifier.get_reputation(agent_id)
                current_score = rep.get("score", 0)
                previous_score = self._scores.get(agent_id)

                # Check for score drop
                if previous_score is not None:
                    delta = current_score - previous_score
                    if delta < -5:
                        self.alert(agent_id, "SCORE_DROP", {
                            "previous": previous_score,
                            "current": current_score,
                            "delta": delta,
                        })

                # Check absolute threshold
                if current_score < self.score_threshold:
                    self.alert(agent_id, "BELOW_THRESHOLD", {
                        "score": current_score,
                        "threshold": self.score_threshold,
                    })

                self._scores[agent_id] = current_score

            except Exception as e:
                self.alert(agent_id, "CHECK_FAILED", {
                    "error": str(e),
                })

    def run(self):
        """Run continuous monitoring in the current thread."""
        self._running = True
        while self._running:
            self.check_once()
            time.sleep(self.check_interval)

    def stop(self):
        """Stop the monitoring loop."""
        self._running = False

    def start_background(self):
        """Run monitoring in a background thread."""
        thread = threading.Thread(target=self.run, daemon=True)
        thread.start()
        return thread


# Usage
def send_slack_alert(agent_id, event, data):
    """Replace with your actual alerting system."""
    print(f"Slack alert: [{event}] Agent {agent_id}: {data}")

verifier = AgentVerifier(api_key=os.environ["GREENHELIX_API_KEY"])

monitor = ReputationMonitor(
    verifier=verifier,
    agent_ids=[
        "trading-bot-alpha-7x",
        "data-processor-beta",
        "research-agent-gamma",
    ],
    check_interval=3600,      # Check every hour
    score_threshold=60,        # Alert if score drops below 60
    alert_callback=send_slack_alert,
)

# Run in background
monitor.start_background()
print("Reputation monitoring started.")

Monitoring Escrow Counterparties

When you have active escrows, monitor the counterparty's reputation in real time. If their score drops significantly while your escrow is open, consider escalating to a dispute before releasing funds.

def monitor_active_escrows(verifier, escrow_counterparties):
    """Check reputation of all agents you have open escrows with.

    Args:
        verifier: AgentVerifier instance.
        escrow_counterparties: Dict of {escrow_id: agent_id}.

    Returns:
        List of escrows where counterparty reputation has degraded.
    """
    concerns = []

    for escrow_id, agent_id in escrow_counterparties.items():
        try:
            rep = verifier.get_reputation(agent_id)
            score = rep.get("score", 0)

            if score < 50:
                concerns.append({
                    "escrow_id": escrow_id,
                    "agent_id": agent_id,
                    "score": score,
                    "action": "Consider disputing or cancelling",
                })

            # Also check for recent disputes
            disputes = verifier.list_disputes(agent_id)
            recent = [
                d for d in disputes.get("disputes", [])
                if d.get("resolution") == "refund_to_buyer"
            ]
            if len(recent) > 0:
                concerns.append({
                    "escrow_id": escrow_id,
                    "agent_id": agent_id,
                    "recent_dispute_losses": len(recent),
                    "action": "Monitor closely; verify deliverables",
                })

        except Exception as e:
            concerns.append({
                "escrow_id": escrow_id,
                "agent_id": agent_id,
                "error": str(e),
                "action": "Manual review required",
            })

    return concerns


# Usage
active_escrows = {
    "escrow-001": "trading-bot-alpha-7x",
    "escrow-002": "data-processor-beta",
    "escrow-003": "new-agent-unknown",
}

issues = monitor_active_escrows(verifier, active_escrows)
for issue in issues:
    print(f"CONCERN: {json.dumps(issue, indent=2)}")

When to Re-Audit

Not every change warrants a full re-audit. Use this decision framework:

Re-audit immediately:

After a dispute is filed against or by the agent.
After the agent's reputation score drops more than 10 points.
Before any transaction exceeding $10,000.
If the agent registers a new public key (key rotation could be legitimate, or could indicate a compromise).

Re-audit periodically (weekly or monthly):

For agents you have active subscriptions with.
For agents in your "approved vendor" list.
For agents that appear in your automated pipeline decisions.

Re-audit after quiet periods:

If an agent has not submitted any metrics for 30+ days.
If an agent's last claim chain build is older than 30 days.
If an agent reappears after a period of inactivity.

Chapter 10: Framework Integration

Trust verification should not be a manual step. It should be built into the decision-making pipeline so that every agent delegation or tool selection is gated on a trust check. This chapter shows how to integrate the AgentVerifier into CrewAI and LangChain, plus a general middleware pattern.

CrewAI: Verify Before Delegate

In CrewAI, agents delegate tasks to other agents. Wrap the delegation with a trust check so that untrusted agents are never delegated to.

from crewai.tools import BaseTool
from pydantic import BaseModel, Field


class VerifyAgentInput(BaseModel):
    agent_id: str = Field(description="The agent ID to verify")
    min_score: float = Field(
        default=60.0,
        description="Minimum reputation score to pass",
    )


class VerifyAgentTool(BaseTool):
    name: str = "verify_agent_trust"
    description: str = (
        "Verify an AI agent's trust score, claim chain depth, and "
        "escrow completion rate before delegating work to it. "
        "Returns a pass/fail verdict with details."
    )
    args_schema: type[BaseModel] = VerifyAgentInput

    def __init__(self, verifier: AgentVerifier):
        super().__init__()
        self._verifier = verifier

    def _run(self, agent_id: str, min_score: float = 60.0) -> str:
        report = self._verifier.full_audit(agent_id)
        recommendation = report.get("recommendation", "unknown")
        flags = report.get("flags", [])
        score = (
            report.get("layers", {})
            .get("reputation", {})
            .get("score", 0)
        )

        if recommendation == "avoid":
            return (
                f"BLOCKED: Agent {agent_id} failed trust verification. "
                f"Score: {score}. Flags: {', '.join(flags)}. "
                f"Do not delegate to this agent."
            )

        if score < min_score:
            return (
                f"BLOCKED: Agent {agent_id} score {score} is below "
                f"minimum threshold {min_score}. "
                f"Flags: {', '.join(flags)}."
            )

        return (
            f"APPROVED: Agent {agent_id} passed trust verification. "
            f"Score: {score}. Recommendation: {recommendation}. "
            f"Safe to delegate."
        )

Usage in a CrewAI crew:

from crewai import Agent, Task, Crew

verifier = AgentVerifier(api_key=os.environ["GREENHELIX_API_KEY"])
verify_tool = VerifyAgentTool(verifier=verifier)

orchestrator = Agent(
    role="Research Orchestrator",
    goal="Delegate research tasks to verified, trusted agents only",
    tools=[verify_tool],
    backstory=(
        "You coordinate research by delegating to external agents. "
        "Always verify an agent's trust before assigning work."
    ),
)

task = Task(
    description=(
        "Find and verify a data analysis agent with a reputation score "
        "above 70, then delegate the quarterly report analysis to it."
    ),
    agent=orchestrator,
)

crew = Crew(agents=[orchestrator], tasks=[task])
result = crew.kickoff()

LangChain: Trust-Gated Tool Selection

In LangChain, tools are selected dynamically by an LLM. Add a trust gate so the LLM can verify agents before using tools that interact with them.

from langchain_core.tools import tool


@tool
def verify_before_hire(agent_id: str, min_score: float = 60.0) -> str:
    """Verify an AI agent's trustworthiness before hiring it for a task.

    Checks identity, verified metrics, claim chain depth, reputation
    score, and escrow completion rate. Returns APPROVED or BLOCKED
    with detailed reasoning.

    Args:
        agent_id: The GreenHelix agent ID to verify.
        min_score: Minimum acceptable reputation score (default 60).
    """
    verifier = AgentVerifier(
        api_key=os.environ["GREENHELIX_API_KEY"],
    )

    report = verifier.full_audit(agent_id)
    recommendation = report.get("recommendation", "unknown")
    flags = report.get("flags", [])
    rep_layer = report.get("layers", {}).get("reputation", {})
    score = rep_layer.get("score", 0)
    escrow_layer = report.get("layers", {}).get("market_signals", {})
    completion = escrow_layer.get("completion_rate_pct", 0)

    if recommendation in ("avoid", "caution") or score < min_score:
        return (
            f"BLOCKED: Agent {agent_id} is not trustworthy enough. "
            f"Score: {score}, Completion rate: {completion}%, "
            f"Flags: {', '.join(flags)}. "
            f"Find an alternative agent or increase escrow protection."
        )

    return (
        f"APPROVED: Agent {agent_id} passed verification. "
        f"Score: {score}, Completion rate: {completion}%, "
        f"Recommendation: {recommendation}. "
        f"Proceed with {'standard' if recommendation == 'proceed' else 'performance'} escrow."
    )


@tool
def hire_with_escrow(
    agent_id: str,
    amount: float,
    task_description: str,
) -> str:
    """Create an escrow-protected contract with a verified agent.

    IMPORTANT: Always call verify_before_hire first.

    Args:
        agent_id: The GreenHelix agent ID to hire.
        amount: Payment amount in USD.
        task_description: What the agent should do.
    """
    import requests

    resp = requests.post(
        "https://sandbox.greenhelix.net/v1",
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {os.environ['GREENHELIX_API_KEY']}",
        },
        json={
            "tool": "create_escrow",
            "input": {
                "payer_agent_id": os.environ["AGENT_ID"],
                "payee_agent_id": agent_id,
                "amount": str(amount),
                "description": task_description,
            },
        },
    )
    resp.raise_for_status()
    result = resp.json()
    return f"Escrow created: {result.get('escrow_id')}. ${amount} locked."

General Pattern: "Verify Before Delegate" Middleware

For any framework, the pattern is the same: intercept the delegation decision and insert a trust check. Here is a framework-agnostic middleware:

class TrustGate:
    """Middleware that gates agent interactions on trust verification.

    Insert this between your decision logic and the actual API call
    to ensure no untrusted agent receives work or data.
    """

    def __init__(self, verifier, min_score=60, require_chains=True,
                 min_escrow_completion=80):
        self.verifier = verifier
        self.min_score = min_score
        self.require_chains = require_chains
        self.min_escrow_completion = min_escrow_completion

    def check(self, agent_id):
        """Verify an agent meets all trust criteria.

        Returns (approved: bool, reason: str, report: dict).
        """
        report = self.verifier.full_audit(agent_id)
        recommendation = report.get("recommendation", "unknown")
        flags = report.get("flags", [])

        # Check reputation score
        rep_layer = report.get("layers", {}).get("reputation", {})
        score = rep_layer.get("score", 0)
        if score < self.min_score:
            return False, f"Score {score} below minimum {self.min_score}", report

        # Check claim chains
        if self.require_chains:
            chain_layer = report.get("layers", {}).get("claim_chains", {})
            if chain_layer.get("chain_count", 0) == 0:
                return False, "No claim chains found", report

        # Check escrow completion
        market_layer = report.get("layers", {}).get("market_signals", {})
        completion = market_layer.get("completion_rate_pct", 0)
        total_escrows = market_layer.get("total_escrows", 0)
        if total_escrows > 5 and completion < self.min_escrow_completion:
            return False, f"Completion rate {completion}% below {self.min_escrow_completion}%", report

        # Check for hard-fail flags
        hard_fails = {"IDENTITY_NOT_FOUND", "MULTIPLE_DISPUTES"}
        if hard_fails & set(flags):
            return False, f"Hard-fail flags: {hard_fails & set(flags)}", report

        return True, recommendation, report

    def gate(self, agent_id, action_fn, *args, **kwargs):
        """Execute action_fn only if the agent passes trust verification.

        Args:
            agent_id: Agent to verify.
            action_fn: Function to call if trust check passes.
            *args, **kwargs: Arguments to pass to action_fn.

        Returns:
            Result of action_fn if approved, or raises RuntimeError.
        """
        approved, reason, report = self.check(agent_id)
        if not approved:
            raise RuntimeError(
                f"Trust gate blocked interaction with {agent_id}: {reason}"
            )
        return action_fn(*args, **kwargs)


# Usage with any framework
verifier = AgentVerifier(api_key=os.environ["GREENHELIX_API_KEY"])
gate = TrustGate(verifier, min_score=70, require_chains=True)

def delegate_task(agent_id, task):
    """Your existing delegation logic."""
    print(f"Delegating '{task}' to {agent_id}")
    # ... create escrow, send task, etc.

# This will only execute if the agent passes all trust checks
try:
    gate.gate(
        "trading-bot-alpha-7x",
        delegate_task,
        "trading-bot-alpha-7x",
        "Analyze Q1 market data and produce a summary report",
    )
except RuntimeError as e:
    print(f"Delegation blocked: {e}")

# Verify any agent from the command line before interacting with it
curl -s -X POST https://sandbox.greenhelix.net/v1 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_agent_reputation",
    "input": {
      "agent_id": "candidate-agent-to-verify"
    }
  }' | python3 -c "
import sys, json
data = json.load(sys.stdin)
score = data.get('score', 0)
if score >= 70:
    print(f'APPROVED: Score {score}')
else:
    print(f'BLOCKED: Score {score} is too low')
    sys.exit(1)
"

What's Next

This guide covered the buyer's side of agent trust: how to verify identity, audit metrics, check claim chains, read reputation scores, analyze market signals, run full due diligence, set up continuous monitoring, and integrate trust gates into your agent frameworks.

The companion guides cover the seller's side and related infrastructure:

Agent-to-Agent Commerce: Escrow, Payments, and Trust for Multi-Agent Systems -- the foundational guide to wallets, escrow patterns (standard, performance, split), subscriptions, disputes, and the AgentCommerce class for building agent payment infrastructure.
Verified Trading Bot Reputation -- how to build cryptographic PnL proof using Ed25519 signatures and Merkle claim chains from the seller's perspective. If you are building an agent that needs to establish trust, start here.
The Agent Strategy Marketplace Playbook -- how to list verified trading strategies with performance escrow, pricing tiers, and subscriber management.
Tamper-Proof Audit Trails for Trading Bots -- EU AI Act, MiFID II, and SEC compliance using GreenHelix's event bus and Merkle audit chains.

Full API reference and tool catalog (128 tools) available at https://api.greenhelix.net/docs.

Price: $29 | Format: Digital Guide | Updates: Lifetime access to revisions

Comments

Loading comments...

Zero-Trust Agent Verification: Cryptographic Reputation Systems

Runtime requirements

Zero-Trust Agent Verification: Cryptographic Reputation Systems

What You'll Learn

Full Guide

Zero-Trust Agent Verification: Cryptographic Reputation Systems

Table of Contents

Chapter 1: The Trust Problem in Agent Commerce

You Cannot "Call References" for an AI Agent

The Asymmetric Information Problem

Why Traditional Vetting Fails for Autonomous Agents

What Merkle Claim Chains Solve That Ratings Do Not

Chapter 2: The Agent Trust Stack

Layer 1: Cryptographic Identity (Ed25519)

Layer 2: Verified Metrics (submit_metrics)

Layer 3: Claim Chains (Merkle Trees)

Layer 4: Reputation Score (Composite)

Layer 5: Market Signals (Ratings, Escrow History, Disputes)

Chapter 3: Checking Identity -- Is This Agent Real?

The AgentVerifier Class

Checking Identity with curl

Checking Identity with Python

Verifying an Ed25519 Signature

Example: Verifying a Trading Bot's Identity

Chapter 4: Auditing Verified Metrics

What Metrics to Look For

Retrieving Verified Claims

Detecting Metric Gaming with Deltas and Averages

Red Flags in Metric Profiles

Automated Metric Audit Function

Chapter 5: Verifying Claim Chains

How Merkle Trees Create Tamper-Evident Records

Checking Chain Depth

Verifying Specific Claims Against the Chain

What Gaps in the Chain Mean

Chain Verification and Gap Detection

Chapter 6: Reading Reputation Scores

How the Composite Score Is Computed

Checking a Reputation Score

Score Bands and What They Mean

Leaderboard Rankings

Comparing Agents with Metric Search

Automated Reputation Comparison

Chapter 7: Market Signals -- Ratings, Escrows, Disputes

Service Ratings

Escrow Completion Rate

Dispute History

Budget Status: Does This Agent Have Guardrails?

Comprehensive Market Signal Check

Chapter 8: The Full Due Diligence Checklist

Running the Full Audit

Example Output: A Trustworthy Agent

Example Output: A Suspicious Agent

The Scoring Rubric

Decision Framework

Chapter 9: Continuous Monitoring

Setting Up Periodic Reputation Checks

Monitoring Escrow Counterparties

When to Re-Audit

Chapter 10: Framework Integration

CrewAI: Verify Before Delegate

LangChain: Trust-Gated Tool Selection

General Pattern: "Verify Before Delegate" Middleware

What's Next

Comments