Token Guard

v1.5.0

Prevents LLM API 429 errors by estimating tokens, tracking quotas, throttling requests, detecting duplicates, caching responses, and auto-fallback by model.

0· 891·5 current·6 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
Name/description imply a token/429 prevention engine and the included TokenGuard class does implement basic TPM/RPM checks and atomic state writes, which aligns with the stated purpose. However SKILL.md advertises multiple features (duplicate detection, response caching, 429 parser, record_usage/cache_response/record_429 methods, auto model fallback chains, etc.) that are not implemented in scripts/token_guard.py. That mismatch means the skill does not actually provide many of the advertised capabilities.
!
Instruction Scope
SKILL.md usage examples instruct callers to call guard.record_usage(...), guard.cache_response(...), guard.record_429(...), and other methods, but the code only exposes TokenGuard.check_quota(...) and no record/cache methods. The instructions therefore direct an agent/developer to call non-existent APIs, which will cause runtime errors or undefined behavior. The README also claims duplicate detection and caching, but the code does not store prompts or responses or implement duplicate blocking — so the runtime scope described is inaccurate.
Install Mechanism
No install spec is provided (instruction-only skill with a single script). No external downloads or package installs are required, which minimizes install-time risk.
Credentials
The skill requests no environment variables or credentials and the code does not read environment variables, secrets, or network endpoints. It does write a local state file but does not log prompt contents or responses, so credential or prompt exfiltration is not apparent.
Persistence & Privilege
TokenGuard writes a state.json file by default into a directory computed relative to the script (base_dir = two directories above the script). That creates persistent state on disk (usage counts, request counts, window_start). This is expected for quota tracking but you should note where files will be written and whether that location is writable or appropriate. always:false and no special privileges requested.
What to consider before installing
Summary of things to consider before installing: - The implementation and documentation disagree. SKILL.md promises caching, duplicate detection, record_usage/cache_response/record_429 helpers and richer behavior; the shipped script only provides TokenGuard.check_quota(...) and saves simple usage/request counters. If you rely on the advertised APIs they will fail. Ask the author for a matching release or updated code. - The script writes a state.json file (usage counters and timestamps) into the skill's base directory by default. This is normal for quota tracking but confirm the path is acceptable and writable in your environment if you care about where files are stored. - There are no network calls, no environment variables read, and no obvious exfiltration of prompts/responses in the code. That reduces risk, but the mismatch between docs and code is a functional risk: an agent expecting missing methods may error or behave unpredictably. - Recommended actions: (1) run the script in a sandboxed environment to verify behavior, (2) request a corrected SKILL.md or an updated script implementing the advertised features (or modify the agent to only call check_quota), and (3) inspect/monitor the created state.json while testing to ensure no sensitive data is written. If you need the advertised caching/duplicate-detection, do not deploy this version until those features are implemented.

Like a lobster shell, security has layers — review code before you run it.

latestvk97d4xkkjnszmwzg66hp3byc2s812by8
891downloads
0stars
3versions
Updated 1mo ago
v1.5.0
MIT-0

TokenGuard — LLM API 429 Prevention Engine

<!-- 🌌 Aoineco-Verified | S-DNA: AOI-2026-0213-SDNA-TG01 -->

Version: 1.5.0
Author: Aoineco & Co.
License: MIT
Tags: rate-limit, 429, token-management, cost-optimization, llm-guard, high-performance

Description

Prevents LLM API 429 (Rate Limit / Resource Exhausted) errors by intercepting requests before they're sent. Designed for users on free/low-cost API plans who need maximum intelligence per dollar.

Core philosophy: "Intelligence is measured not by how much you spend, but by how little you need."

Problem

When using LLM APIs (especially Google Gemini Flash with 1M TPM limit):

  • Large documents (docx, PDFs) can consume the entire minute quota in one request
  • Failed requests still count toward token usage
  • Retry loops after 429 errors waste more tokens → death spiral
  • No built-in way to detect runaway/duplicate requests

Features

FeatureDescription
Pre-flight Token EstimationEstimates token count before API call (CJK-aware, no tiktoken dependency)
Real-time Quota TrackingTracks per-model per-minute token usage with sliding window
Smart ThrottleAuto-waits when quota > 80%, blocks at > 95%
Duplicate DetectionBlocks identical requests within 60s window (3+ = runaway)
Response CachingCaches successful responses for duplicate requests
Auto Model FallbackSwitches to cheaper/available model when primary is exhausted
429 Error ParserExtracts exact retry delay from Google/Anthropic error responses
Batch vs Mistake DetectionDistinguishes intentional bulk processing from error loops

Supported Models

Pre-configured quotas for:

  • gemini-3-flash (1M TPM)
  • gemini-3-pro (2M TPM)
  • claude-haiku (50K TPM)
  • claude-sonnet (200K TPM)
  • claude-opus (200K TPM)
  • gpt-4o (800K TPM)
  • deepseek (1M TPM)

Custom quotas can be added for any model.

Usage

from token_guard import TokenGuard

guard = TokenGuard()

# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")

if decision.action == "proceed":
    response = call_your_api(prompt_text)
    guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
    guard.cache_response(prompt_text, response)

elif decision.action == "wait":
    time.sleep(decision.wait_seconds)
    # retry

elif decision.action == "fallback":
    response = call_your_api(prompt_text, model=decision.fallback_model)

elif decision.action == "block":
    print(f"Blocked: {decision.reason}")

# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)

Integration with OpenClaw

Add to your agent's config or use as a middleware:

skills:
  - token-guard

The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.

File Structure

token-guard/
├── SKILL.md          # This file
└── scripts/
    └── token_guard.py  # Main engine (zero external dependencies)

Status Output Example

{
  "models": {
    "gemini-3-flash": {
      "tpm_limit": 1000000,
      "used_this_minute": 750000,
      "remaining": 250000,
      "usage_pct": "75.0%",
      "status": "🟢 OK"
    }
  },
  "stats": {
    "total_checks": 42,
    "tokens_saved": 128000,
    "blocks": 3,
    "fallbacks": 2
  }
}

Zero Dependencies

Pure Python 3.10+. No pip install needed. No tiktoken, no external API calls. Designed for the $7 Bootstrap Protocol — every byte counts.

Comments

Loading comments...