Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Token Guard

v1.5.0

Prevents LLM API 429 errors by estimating tokens, tracking quotas, throttling requests, detecting duplicates, caching responses, and auto-fallback by model.

0· 938· 3 versions· 5 current· 6 all-time· Updated 11h ago· MIT-0

Install

openclaw skills install token-guard

TokenGuard — LLM API 429 Prevention Engine

<!-- 🌌 Aoineco-Verified | S-DNA: AOI-2026-0213-SDNA-TG01 -->

Version: 1.5.0
Author: Aoineco & Co.
License: MIT
Tags: rate-limit, 429, token-management, cost-optimization, llm-guard, high-performance

Description

Prevents LLM API 429 (Rate Limit / Resource Exhausted) errors by intercepting requests before they're sent. Designed for users on free/low-cost API plans who need maximum intelligence per dollar.

Core philosophy: "Intelligence is measured not by how much you spend, but by how little you need."

Problem

When using LLM APIs (especially Google Gemini Flash with 1M TPM limit):

  • Large documents (docx, PDFs) can consume the entire minute quota in one request
  • Failed requests still count toward token usage
  • Retry loops after 429 errors waste more tokens → death spiral
  • No built-in way to detect runaway/duplicate requests

Features

FeatureDescription
Pre-flight Token EstimationEstimates token count before API call (CJK-aware, no tiktoken dependency)
Real-time Quota TrackingTracks per-model per-minute token usage with sliding window
Smart ThrottleAuto-waits when quota > 80%, blocks at > 95%
Duplicate DetectionBlocks identical requests within 60s window (3+ = runaway)
Response CachingCaches successful responses for duplicate requests
Auto Model FallbackSwitches to cheaper/available model when primary is exhausted
429 Error ParserExtracts exact retry delay from Google/Anthropic error responses
Batch vs Mistake DetectionDistinguishes intentional bulk processing from error loops

Supported Models

Pre-configured quotas for:

  • gemini-3-flash (1M TPM)
  • gemini-3-pro (2M TPM)
  • claude-haiku (50K TPM)
  • claude-sonnet (200K TPM)
  • claude-opus (200K TPM)
  • gpt-4o (800K TPM)
  • deepseek (1M TPM)

Custom quotas can be added for any model.

Usage

from token_guard import TokenGuard

guard = TokenGuard()

# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")

if decision.action == "proceed":
    response = call_your_api(prompt_text)
    guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
    guard.cache_response(prompt_text, response)

elif decision.action == "wait":
    time.sleep(decision.wait_seconds)
    # retry

elif decision.action == "fallback":
    response = call_your_api(prompt_text, model=decision.fallback_model)

elif decision.action == "block":
    print(f"Blocked: {decision.reason}")

# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)

Integration with OpenClaw

Add to your agent's config or use as a middleware:

skills:
  - token-guard

The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.

File Structure

token-guard/
├── SKILL.md          # This file
└── scripts/
    └── token_guard.py  # Main engine (zero external dependencies)

Status Output Example

{
  "models": {
    "gemini-3-flash": {
      "tpm_limit": 1000000,
      "used_this_minute": 750000,
      "remaining": 250000,
      "usage_pct": "75.0%",
      "status": "🟢 OK"
    }
  },
  "stats": {
    "total_checks": 42,
    "tokens_saved": 128000,
    "blocks": 3,
    "fallbacks": 2
  }
}

Zero Dependencies

Pure Python 3.10+. No pip install needed. No tiktoken, no external API calls. Designed for the $7 Bootstrap Protocol — every byte counts.

Version tags

latestvk97d4xkkjnszmwzg66hp3byc2s812by8