AllToken

Bootstrap a modular AllToken agent — chat, async image+video, model routing, OpenAI-compatible SDK. Works inside Hermes, OpenClaw, Claude Code, Codex CLI, OpenCode, or any runtime that loads SKILL.md.

Audits

Pass

ClawScanPass

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw skills install alltoken

Build a Modular AI Agent with AllToken

This skill helps you create a modular AI agent powered by AllToken — a unified, OpenAI-compatible API with access to leading language, image, and video models behind one endpoint, plus automatic provider fallbacks and cost-effective routing.

Designed to be invoked from Hermes, OpenClaw, Claude Code, Codex CLI, or any other agent runtime that consumes skills.

Standalone Agent Core — runs independently, extensible via hooks
OpenAI SDK compatible — change two settings and you're done
Multi-modal — chat, image (async), video (async) on one key
Optional Ink TUI — terminal UI cleanly separated from agent logic

Architecture

┌─────────────────────────────────────────────────────┐
│                Your Application (TS/Py)             │
├─────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │
│  │   Ink TUI   │  │  HTTP API   │  │   Hermes /  │  │
│  │             │  │             │  │   OpenClaw  │  │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  │
│         │                │                │         │
│         └────────────────┼────────────────┘         │
│                          ▼                          │
│              ┌───────────────────────┐              │
│              │      Agent Core       │              │
│              │  (hooks & lifecycle)  │              │
│              └───────────┬───────────┘              │
│                          ▼                          │
│              ┌───────────────────────┐              │
│              │   AllToken REST API   │              │
│              │ api.alltoken.ai/v1    │              │
│              └───────────────────────┘              │
└─────────────────────────────────────────────────────┘

Prerequisites

Create an AllToken account at https://alltoken.ai.
Generate an API key in Settings → API Keys (the key is shown only once — copy it).
Top up credits if needed in Settings → Billing.

Security: never commit your API key. Use ALLTOKEN_API_KEY from the environment.

API at a glance

Base URL: https://api.alltoken.ai/v1
Auth header: Authorization: Bearer $ALLTOKEN_API_KEY
Compatibility: OpenAI-compatible — any OpenAI SDK works by overriding base_url/baseURL.
Coverage:
- POST /chat/completions — chat (streaming, tool calls, thinking, web search)
- GET /models — OpenAI-compatible model list
- POST /images/generations/async + GET /images/generations/{id} — async image generation
- POST /videos/generations + GET /videos/generations/{id} — async video generation
- GET /api-account/models / /{model_path} / /filters — full catalog with pricing and capabilities (public, no auth required)
- GET /api-account/providers (+ /{id}/stats) — providers, health, throughput (public)
- GET /api-account/rankings/all — leaderboards, benchmarks, speed rankings (public)
- GET /api-account/health/{routes,summary} — route health & availability (public)
- GET /api-account/user/{api-keys,usage,billing,balance} — web-session token only, not callable with your Bearer API key (you'll get 401 auth_error / invalid_token). Manage these in Settings → API Keys / Billing on https://alltoken.ai.

Project Setup

Step 1 — Initialize project

mkdir my-alltoken-agent && cd my-alltoken-agent
npm init -y
npm pkg set type="module"

Step 2 — Install dependencies

npm install openai zod eventemitter3
npm install ink react        # optional: TUI only
npm install -D typescript @types/react tsx

Step 3 — `tsconfig.json`

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "jsx": "react-jsx",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "outDir": "dist"
  },
  "include": ["src"]
}

Step 4 — Scripts in `package.json`

{
  "scripts": {
    "start": "tsx src/cli.tsx",
    "start:headless": "tsx src/headless.ts",
    "dev": "tsx watch src/cli.tsx"
  }
}

Step 5 — File layout

src/
├── client.ts       # AllToken client (OpenAI SDK with overridden baseURL)
├── agent.ts        # Standalone agent core with hooks
├── tools.ts        # Function-calling tool definitions
├── media.ts        # Async image + video helpers (poll loop)
├── cli.tsx         # Optional Ink TUI
└── headless.ts     # Headless / scriptable example

Step 1 — AllToken Client

Create src/client.ts. AllToken is OpenAI-compatible; we just override the base URL.

import OpenAI from 'openai';

export function createAllTokenClient(apiKey = process.env.ALLTOKEN_API_KEY): OpenAI {
  if (!apiKey) throw new Error('ALLTOKEN_API_KEY is not set');
  return new OpenAI({
    apiKey,
    baseURL: 'https://api.alltoken.ai/v1',
  });
}

Step 2 — Agent Core with Hooks

Create src/agent.ts — the standalone agent. It streams via OpenAI's SSE protocol and emits typed events for any UI to consume.

import OpenAI from 'openai';
import type {
  ChatCompletionMessageParam,
  ChatCompletionTool,
  ChatCompletionToolMessageParam,
} from 'openai/resources/chat/completions';
import { EventEmitter } from 'eventemitter3';
import { createAllTokenClient } from './client.js';

export interface Message {
  role: 'user' | 'assistant' | 'system' | 'tool';
  content: string;
  tool_call_id?: string;
  name?: string;
}

export interface AgentEvents {
  'message:user': (message: Message) => void;
  'message:assistant': (message: Message) => void;
  'stream:start': () => void;
  'stream:delta': (delta: string, accumulated: string) => void;
  'stream:end': (fullText: string) => void;
  'tool:call': (name: string, args: unknown, callId: string) => void;
  'tool:result': (name: string, result: unknown, callId: string) => void;
  'thinking:start': () => void;
  'thinking:end': () => void;
  'error': (error: Error) => void;
}

export interface ToolHandler {
  definition: ChatCompletionTool;
  execute: (args: any) => Promise<unknown> | unknown;
}

export interface AgentConfig {
  apiKey?: string;
  model?: string;                  // e.g. 'minimax-m2.7', 'gpt-5.4'
  instructions?: string;
  tools?: ToolHandler[];
  maxSteps?: number;               // tool-loop step limit
  temperature?: number;
  enableSearch?: boolean;          // AllToken-specific web-search toggle
}

export class Agent extends EventEmitter<AgentEvents> {
  private client: OpenAI;
  private messages: ChatCompletionMessageParam[] = [];
  private cfg: Required<Omit<AgentConfig, 'apiKey'>>;
  private toolMap: Map<string, ToolHandler>;

  constructor(config: AgentConfig = {}) {
    super();
    this.client = createAllTokenClient(config.apiKey);
    this.cfg = {
      model: config.model ?? 'minimax-m2.7',
      instructions: config.instructions ?? 'You are a helpful assistant.',
      tools: config.tools ?? [],
      maxSteps: config.maxSteps ?? 5,
      temperature: config.temperature ?? 0.7,
      enableSearch: config.enableSearch ?? false,
    };
    this.toolMap = new Map(this.cfg.tools.map((t) => [t.definition.function.name, t]));
    if (this.cfg.instructions) {
      this.messages.push({ role: 'system', content: this.cfg.instructions });
    }
  }

  getMessages(): ChatCompletionMessageParam[] { return [...this.messages]; }
  clearHistory(): void {
    this.messages = this.cfg.instructions
      ? [{ role: 'system', content: this.cfg.instructions }]
      : [];
  }
  setInstructions(text: string): void {
    this.cfg.instructions = text;
    if (this.messages[0]?.role === 'system') this.messages[0] = { role: 'system', content: text };
    else this.messages.unshift({ role: 'system', content: text });
  }
  addTool(t: ToolHandler): void {
    this.cfg.tools.push(t);
    this.toolMap.set(t.definition.function.name, t);
  }

  /** Send a user message, run the tool-loop, stream tokens. Returns the final assistant text. */
  async send(content: string): Promise<string> {
    this.messages.push({ role: 'user', content });
    this.emit('message:user', { role: 'user', content });
    this.emit('thinking:start');

    let finalText = '';

    try {
      for (let step = 0; step < this.cfg.maxSteps; step++) {
        const stream = await this.client.chat.completions.create({
          model: this.cfg.model,
          messages: this.messages,
          temperature: this.cfg.temperature,
          tools: this.cfg.tools.length ? this.cfg.tools.map((t) => t.definition) : undefined,
          stream: true,
          // AllToken extension: opt-in web search (model-dependent)
          ...(this.cfg.enableSearch ? ({ enable_search: true } as any) : {}),
        });

        this.emit('stream:start');
        let text = '';
        const toolCalls: Record<number, { id?: string; name?: string; args: string }> = {};
        let finishReason: string | undefined;

        for await (const chunk of stream) {
          const choice = chunk.choices[0];
          if (!choice) continue;
          const delta: any = choice.delta;

          if (delta?.content) {
            text += delta.content;
            this.emit('stream:delta', delta.content, text);
          }
          if (delta?.tool_calls) {
            for (const tc of delta.tool_calls) {
              const slot = toolCalls[tc.index] ?? (toolCalls[tc.index] = { args: '' });
              if (tc.id) slot.id = tc.id;
              if (tc.function?.name) slot.name = tc.function.name;
              if (tc.function?.arguments) slot.args += tc.function.arguments;
            }
          }
          if (choice.finish_reason) finishReason = choice.finish_reason;
        }

        this.emit('stream:end', text);

        // Persist the assistant turn (with tool_calls if any)
        const calls = Object.values(toolCalls).filter((c) => c.id && c.name);
        if (calls.length) {
          this.messages.push({
            role: 'assistant',
            content: text || null,
            tool_calls: calls.map((c) => ({
              id: c.id!,
              type: 'function',
              function: { name: c.name!, arguments: c.args || '{}' },
            })),
          } as any);

          // Execute tools and append results
          for (const c of calls) {
            const handler = this.toolMap.get(c.name!);
            const parsed = safeJson(c.args);
            this.emit('tool:call', c.name!, parsed, c.id!);
            const result = handler
              ? await handler.execute(parsed)
              : { error: `unknown tool: ${c.name}` };
            this.emit('tool:result', c.name!, result, c.id!);
            const toolMsg: ChatCompletionToolMessageParam = {
              role: 'tool',
              tool_call_id: c.id!,
              content: typeof result === 'string' ? result : JSON.stringify(result),
            };
            this.messages.push(toolMsg);
          }
          continue; // next loop step
        }

        // Terminal: regular completion
        this.messages.push({ role: 'assistant', content: text });
        this.emit('message:assistant', { role: 'assistant', content: text });
        finalText = text;
        break;
      }

      return finalText;
    } catch (err) {
      const error = err instanceof Error ? err : new Error(String(err));
      this.emit('error', error);
      throw error;
    } finally {
      this.emit('thinking:end');
    }
  }

  /** Non-streaming convenience method. */
  async sendSync(content: string): Promise<string> {
    this.messages.push({ role: 'user', content });
    this.emit('message:user', { role: 'user', content });
    const res = await this.client.chat.completions.create({
      model: this.cfg.model,
      messages: this.messages,
      temperature: this.cfg.temperature,
    });
    const text = res.choices[0]?.message?.content ?? '';
    this.messages.push({ role: 'assistant', content: text });
    this.emit('message:assistant', { role: 'assistant', content: text });
    return text;
  }
}

function safeJson(s: string): unknown {
  try { return JSON.parse(s || '{}'); } catch { return { _raw: s }; }
}

export function createAgent(config: AgentConfig = {}): Agent {
  return new Agent(config);
}

Step 3 — Define Tools

Create src/tools.ts:

import type { ToolHandler } from './agent.js';

export const timeTool: ToolHandler = {
  definition: {
    type: 'function',
    function: {
      name: 'get_current_time',
      description: 'Get the current date and time',
      parameters: {
        type: 'object',
        properties: {
          timezone: { type: 'string', description: 'IANA timezone, e.g. "UTC", "America/New_York"' },
        },
      },
    },
  },
  execute: ({ timezone }: { timezone?: string }) => ({
    time: new Date().toLocaleString('en-US', { timeZone: timezone || 'UTC' }),
    timezone: timezone || 'UTC',
  }),
};

export const calculatorTool: ToolHandler = {
  definition: {
    type: 'function',
    function: {
      name: 'calculate',
      description: 'Evaluate a basic math expression',
      parameters: {
        type: 'object',
        properties: { expression: { type: 'string' } },
        required: ['expression'],
      },
    },
  },
  execute: ({ expression }: { expression: string }) => {
    // Safe arithmetic evaluator — shunting-yard + RPN, no eval/Function.
    const tokens = expression.match(/\d+(?:\.\d+)?|[+\-*/()]/g) ?? [];
    const prec: Record<string, number> = { '+': 1, '-': 1, '*': 2, '/': 2 };
    const out: string[] = [];
    const ops: string[] = [];
    for (const t of tokens) {
      if (/^\d/.test(t)) {
        out.push(t);
      } else if (t === '(') {
        ops.push(t);
      } else if (t === ')') {
        while (ops.length && ops[ops.length - 1] !== '(') out.push(ops.pop()!);
        ops.pop();
      } else {
        while (
          ops.length &&
          ops[ops.length - 1] !== '(' &&
          (prec[ops[ops.length - 1]] ?? 0) >= prec[t]
        ) {
          out.push(ops.pop()!);
        }
        ops.push(t);
      }
    }
    while (ops.length) out.push(ops.pop()!);
    const stack: number[] = [];
    for (const t of out) {
      if (/^\d/.test(t)) {
        stack.push(parseFloat(t));
      } else {
        const b = stack.pop()!;
        const a = stack.pop()!;
        stack.push(t === '+' ? a + b : t === '-' ? a - b : t === '*' ? a * b : a / b);
      }
    }
    return { expression, result: stack[0] };
  },
};

export const defaultTools = [timeTool, calculatorTool];

Step 4 — Image & Video helpers

AllToken's image and video endpoints are asynchronous: create a task, poll until completed, then read the result. Create src/media.ts:

import { createAllTokenClient } from './client.js';

const BASE = 'https://api.alltoken.ai/v1';

async function authedFetch(path: string, init: RequestInit = {}) {
  const apiKey = process.env.ALLTOKEN_API_KEY;
  if (!apiKey) throw new Error('ALLTOKEN_API_KEY is not set');
  const res = await fetch(`${BASE}${path}`, {
    ...init,
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json',
      ...(init.headers ?? {}),
    },
  });
  if (!res.ok) {
    const body = await res.text();
    throw new Error(`AllToken ${res.status}: ${body}`);
  }
  return res.json();
}

// ── Images ────────────────────────────────────────────────────────────────
// Result is delivered ONCE: persist `b64_json` immediately. Tasks expire in 30 min.

export interface ImageRequest {
  model?: 'gpt-image-2' | string;          // discover via GET /images/models
  prompt: string;
  size?: '1024x1024' | '1536x1024' | '1024x1536' | 'auto';
  quality?: 'low' | 'medium' | 'high' | 'auto';
  output_format?: 'png' | 'jpeg' | 'webp';
  background?: 'auto' | 'opaque';
  moderation?: 'auto' | 'low';
}

export interface ImageResult {
  id: string;
  status: 'queued' | 'processing' | 'completed' | 'failed' | 'cancelled';
  data?: Array<{ b64_json: string; revised_prompt?: string }>;
  error?: unknown;
}

export async function generateImage(req: ImageRequest, opts: { pollMs?: number } = {}): Promise<ImageResult> {
  const created = await authedFetch('/images/generations/async', {
    method: 'POST',
    body: JSON.stringify({ model: 'gpt-image-2', ...req }),
    // Recommended: deduplicate retries with an Idempotency-Key
    headers: { 'Idempotency-Key': crypto.randomUUID() },
  });

  const id = created.id as string;
  const intervalMs = opts.pollMs ?? 2000;
  while (true) {
    const status = await authedFetch(`/images/generations/${id}`);
    if (status.status === 'completed' || status.status === 'failed' || status.status === 'cancelled') {
      return status;
    }
    await new Promise((r) => setTimeout(r, intervalMs));
  }
}

// ── Videos ────────────────────────────────────────────────────────────────

export interface VideoRequest {
  model: 'seedance-1.5-pro' | 'seedance-2.0' | string;
  prompt: string;
  duration?: number;                       // seconds; -1 = model decides
  ratio?: '16:9' | '9:16' | '4:3' | '3:4' | '21:9' | '1:1' | 'adaptive';
  resolution?: '480p' | '720p' | '1080p';
  generate_audio?: boolean;
  seed?: number;
  watermark?: boolean;
  callback_url?: string;
  // Image-to-video: pass `content` with image_url + role: 'first_frame'
  content?: Array<{
    type: 'image_url' | 'video_url' | 'audio_url' | 'draft_task';
    image_url?: { url: string };
    video_url?: { url: string };
    audio_url?: { url: string };
    role?: 'first_frame' | 'last_frame' | 'reference_image' | 'reference_video' | 'reference_audio';
  }>;
}

export async function generateVideo(req: VideoRequest, opts: { pollMs?: number } = {}) {
  const created = await authedFetch('/videos/generations', {
    method: 'POST',
    body: JSON.stringify(req),
  });
  const id = created.id as string;
  const intervalMs = opts.pollMs ?? 3000;
  while (true) {
    const status = await authedFetch(`/videos/generations/${id}`);
    if (['completed', 'failed', 'cancelled', 'expired'].includes(status.status)) return status;
    await new Promise((r) => setTimeout(r, intervalMs));
  }
}

export async function cancelVideo(id: string) {
  return authedFetch(`/videos/generations/${id}/cancel`, { method: 'POST' });
}

Persist b64_json to disk in one shot — re-polling a delivered image returns 410 image_already_retrieved and the result is gone. The 410 envelope:

{"error":{"code":"image_already_retrieved","message":"Image data was already retrieved; please submit a new generation request","request_id":"...","type":"invalid_request_error"}}

Observed latencies (use these to size your retry budget, not as SLAs):

Image gpt-image-2 1024×1024 quality=low: ~15–25 s end-to-end (verified 20.6 s on a real submit).
Image quality=high or 1536×1024: 30–60 s per docs.
Video seedance-1.5-pro 5 s @ 480 p: 30–120 s typical; 1080 p can take 3–5 min.
Recommended poll interval: 2 s for images, 3 s for videos.

Submit-response fields (the full shape, not just id):

{"id":"igen_d3b8...","status":"queued","model":"gpt-image-2","created_at":"2026-05-12T13:46:09Z"}

After status==completed, the GET adds: data: [{b64_json}], usage: {input_tokens, output_tokens, total_tokens, input_tokens_details}, size, quality, output_format, completed_at, expires_at. Note: revised_prompt is not present in current responses despite appearing in the docs example — treat it as optional.

Step 5 — Headless usage

Create src/headless.ts:

import { createAgent } from './agent.js';
import { defaultTools } from './tools.js';
import { generateImage } from './media.js';
import { writeFile } from 'node:fs/promises';

async function main() {
  const agent = createAgent({
    model: 'minimax-m2.7',
    instructions: 'You are a helpful assistant with tools.',
    tools: defaultTools,
    enableSearch: false,
  });

  agent.on('thinking:start', () => console.log('\n🤔 Thinking...'));
  agent.on('tool:call', (name, args) => console.log(`🔧 ${name}`, args));
  agent.on('stream:delta', (delta) => process.stdout.write(delta));
  agent.on('stream:end', () => console.log());
  agent.on('error', (e) => console.error('❌', e.message));

  // Chat
  await agent.send('What time is it in Tokyo?');

  // Image (async)
  const img = await generateImage({
    prompt: 'A clean studio product photo of a glass teapot on a walnut table',
    size: '1024x1024',
    quality: 'high',
  });
  if (img.status === 'completed' && img.data?.[0]?.b64_json) {
    await writeFile('teapot.png', Buffer.from(img.data[0].b64_json, 'base64'));
    console.log('\n💾 Saved teapot.png');
  }
}

main().catch(console.error);

Run: ALLTOKEN_API_KEY=sk-... npm run start:headless

Step 6 — Optional Ink TUI

Create src/cli.tsx for a terminal chat UI. Subscribe to stream:delta and tool:call events from the agent and render them. The agent core is UI-agnostic — the same instance can power Hermes, OpenClaw, Discord, or HTTP.

import React, { useState, useEffect, useCallback } from 'react';
import { render, Box, Text, useInput, useApp } from 'ink';
import { createAgent, type Message } from './agent.js';
import { defaultTools } from './tools.js';

const agent = createAgent({
  model: 'minimax-m2.7',
  instructions: 'You are a concise assistant.',
  tools: defaultTools,
});

function App() {
  const { exit } = useApp();
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [streaming, setStreaming] = useState('');
  const [loading, setLoading] = useState(false);

  useInput((ch, key) => {
    if (key.escape) exit();
    if (loading) return;
    if (key.return) {
      const text = input.trim();
      if (!text) return;
      setInput('');
      setMessages((m) => [...m, { role: 'user', content: text }]);
      agent.send(text);
    } else if (key.backspace || key.delete) setInput((v) => v.slice(0, -1));
    else if (ch && !key.ctrl && !key.meta) setInput((v) => v + ch);
  });

  useEffect(() => {
    const onStart = () => { setLoading(true); setStreaming(''); };
    const onDelta = (_d: string, acc: string) => setStreaming(acc);
    const onAssistant = (m: Message) => {
      setMessages((prev) => [...prev, m]);
      setStreaming('');
      setLoading(false);
    };
    agent.on('thinking:start', onStart);
    agent.on('stream:delta', onDelta);
    agent.on('message:assistant', onAssistant);
    return () => {
      agent.off('thinking:start', onStart);
      agent.off('stream:delta', onDelta);
      agent.off('message:assistant', onAssistant);
    };
  }, []);

  return (
    <Box flexDirection="column" padding={1}>
      <Text bold color="magenta">🤖 AllToken Agent</Text>
      {messages.map((m, i) => (
        <Box key={i} flexDirection="column" marginTop={1}>
          <Text bold color={m.role === 'user' ? 'cyan' : 'green'}>
            {m.role === 'user' ? '▶ You' : '◀ Assistant'}
          </Text>
          <Text wrap="wrap">{m.content}</Text>
        </Box>
      ))}
      {streaming && (
        <Box flexDirection="column" marginTop={1}>
          <Text bold color="green">◀ Assistant</Text>
          <Text wrap="wrap">{streaming}<Text color="gray">▌</Text></Text>
        </Box>
      )}
      <Box borderStyle="single" borderColor="gray" marginTop={1} paddingX={1}>
        <Text color="yellow">{'> '}</Text>
        <Text>{input}</Text>
        <Text color="gray">{loading ? ' ···' : '█'}</Text>
      </Box>
    </Box>
  );
}

render(<App />);

Python equivalent (one-file)

For Python users — including those embedding the agent inside Hermes or OpenClaw Python tools:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ALLTOKEN_API_KEY"],
    base_url="https://api.alltoken.ai/v1",
)

# Streaming chat
stream = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[{"role": "user", "content": "Explain SSE in one sentence."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Async image (poll loop):

import os, time, base64, uuid, requests

BASE = "https://api.alltoken.ai/v1"
H = {"Authorization": f"Bearer {os.environ['ALLTOKEN_API_KEY']}", "Content-Type": "application/json"}

task = requests.post(
    f"{BASE}/images/generations/async",
    headers={**H, "Idempotency-Key": str(uuid.uuid4())},
    json={"model": "gpt-image-2", "prompt": "A cat astronaut, studio light", "size": "1024x1024", "quality": "high"},
).json()

while True:
    res = requests.get(f"{BASE}/images/generations/{task['id']}", headers=H).json()
    if res["status"] in ("completed", "failed", "cancelled"):
        break
    time.sleep(2)

if res["status"] == "completed":
    with open("cat.png", "wb") as f:
        f.write(base64.b64decode(res["data"][0]["b64_json"]))

Using AllToken from inside Hermes / OpenClaw

Both Hermes and OpenClaw load skills from SKILL.md files and can run TypeScript or Python tools at the agent boundary. There are two integration patterns:

Pattern A — AllToken as your agent's model provider

Point your host agent's HTTP client at AllToken. In OpenClaw / Hermes config, set:

provider:
  base_url: https://api.alltoken.ai/v1
  api_key: ${ALLTOKEN_API_KEY}
model: minimax-m2.7

No code changes needed — the OpenAI-compatible endpoint accepts the same requests.

Pattern B — AllToken as a tool inside another agent

Drop the agent.ts / media.ts modules into the host agent's tools directory and expose them as callable tools (chat, generate_image, generate_video). The host agent (running on any model) then delegates multimodal work to AllToken on demand.

// host-agent-tool.ts
import { createAgent } from './agent.js';
import { generateImage, generateVideo } from './media.js';

const alltoken = createAgent({ model: 'minimax-m2.7' });

export const tools = {
  alltoken_chat: (input: { prompt: string }) => alltoken.sendSync(input.prompt),
  alltoken_image: (input: { prompt: string; size?: string }) => generateImage(input as any),
  alltoken_video: (input: { prompt: string; duration?: number }) =>
    generateVideo({ model: 'seedance-1.5-pro', ...input }),
};

Discovering models

Verified-working model IDs as of 2026-05-12 (use these for quick starts; re-confirm via GET /v1/models before production):

Use case	IDs
Chat — cheap / fast	`gpt-5.4-nano`, `gpt-5.4-mini`, `claude-haiku-4-5`, `gemini-3-flash-preview`, `glm-4.7-flash`, `qwen3.6-flash`, `deepseek-v4-flash`, `minimax-m2.5-highspeed`
Chat — flagship	`gpt-5.4`, `gpt-5.4-pro`, `gpt-5.5`, `claude-opus-4-7`, `claude-sonnet-4-6`, `gemini-3.1-pro-preview`, `glm-5.1`, `deepseek-v4-pro`, `qwen3.6-max-preview`, `kimi-k2.6`, `minimax-m2.7`
Chat — code	`gpt-5.3-codex`, `qwen3-coder-next`
Image	`gpt-image-2`
Video — text/image to video	`seedance-1.5-pro`, `seedance-2.0`, `happyhorse-1.0-t2v`, `happyhorse-1.0-i2v`
Video — editing / reference	`happyhorse-1.0-video-edit`, `happyhorse-1.0-r2v`

Available chat models on a fresh key: 38 as of this writing. Image: 1. Video: 7.

Do not hardcode model IDs in production — the catalog evolves. Use the live endpoints:

// OpenAI-compatible list (good for SDK clients)
const list = await fetch('https://api.alltoken.ai/v1/models', {
  headers: { Authorization: `Bearer ${process.env.ALLTOKEN_API_KEY}` },
}).then((r) => r.json());

// Rich catalog with pricing, capabilities, tags (used by the website)
const catalog = await fetch('https://api.alltoken.ai/api-account/models').then((r) => r.json());

// Single model detail page
const detail = await fetch('https://api.alltoken.ai/api-account/models/gpt-5.4').then((r) => r.json());

Pair with the Rankings API (GET /api-account/rankings/all) for live leaderboards by usage, benchmarks, throughput, and category leaders — useful for --auto model selection.

Routing & fallbacks

AllToken handles provider routing internally. Two knobs:

Account-level default routing — set routing_mode (code or manual), allowed_models, and a default_models priority list on each API key:
```
POST   /api-account/user/api-keys
PUT    /api-account/user/api-keys/{key_id}/default-models
```
Per-request override — pass the exact model ID in the request body to bypass routing for that call.

When a provider returns 502/503, AllToken may automatically fall back to the next provider for the model.

Web search (`enable_search`)

Pass enable_search: true on a chat completion to opt into AllToken's unified web-search backend. Support is per-provider, not per-request shape — same flag, different effective behavior across model families. Live probe on 2026-05-12 (asking "current Bitcoin price"):

Family	Outcome	Notes
DeepSeek (`deepseek-v3.2`, `deepseek-v4-pro`)	✅ Searches	Returns fresh prices with timestamps
Qwen (`qwen3.6-flash`, `qwen3.6-max-preview`)	✅ Searches	Same fresh data via the unified backend
Claude (`claude-opus-4-7`, `claude-sonnet-4-6`)	❌ Silently ignores	Model responds "I don't have web search"
GLM (`glm-5`, `glm-5.1`)	❌ Silently ignores	Same as Claude
Kimi (`kimi-k2.6`)	❌ Silently ignores
Minimax (`minimax-m2.7`)	❌ Silently ignores
Gemini (`gemini-3.1-pro-preview`)	⚠️ Empty / refusal	Inconsistent — re-test before relying
OpenAI (`gpt-5.4`, `gpt-5.4-nano`, `gpt-5.5`)	🔴 HTTP 503 `all_providers_failed`	Upstream rejects the flag

Recommendation: when you need search, default to a DeepSeek or Qwen model. If you're on a different family, fall back to a function-calling pattern (model emits a tool call → your tool hits a search API → you re-invoke). The enable_search matrix above is empirical and provider-side support may change — re-test for critical paths.

Note: AllToken does not include search-result citations in the response annotations[] field today, so detecting "did search fire" requires latency heuristics (typically +6 – 15 s vs no-search baseline) or content sniffing for fresh facts.

Health, rate limits, errors

Per-key rate limits: set rpm_limit, tpm_limit, monthly_quota, credit_limit when creating the key.
Status codes you should handle: 400 invalid params · 401 bad key · 402 insufficient balance · 403 forbidden · 404 not found · 429 rate limited (respect Retry-After) · 5xx upstream — already retried server-side when safe.
Error envelope (real wire format):
```
{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid or revoked API key",
    "param": null,
    "type": "auth_error",
    "request_id": "d81itf8gdg1fp5ko4bjg"
  }
}
```
Note: code is a string slug (e.g. "invalid_api_key", "image_already_retrieved", "all_providers_failed"), not the numeric HTTP status. type groups errors (auth_error, invalid_request_error, api_error, …). Include request_id when filing support tickets.

Python error-dispatch helper:

import json, time, urllib.request, urllib.error

def call(req):
    try:
        return urllib.request.urlopen(req, timeout=60)
    except urllib.error.HTTPError as e:
        body = e.read()
        try:
            err = json.loads(body).get("error", {})
        except Exception:
            err = {}
        retry_after = e.headers.get("Retry-After")     # integer seconds (AllToken format)
        if e.code == 429 and retry_after:
            time.sleep(int(retry_after))
            return call(req)                            # one retry
        if e.code == 401:    raise RuntimeError(f"auth: {err.get('code')} — rotate API key")
        if e.code == 402:    raise RuntimeError(f"top up credits: {err.get('message')}")
        if e.code == 410 and err.get("code") == "image_already_retrieved":
            raise RuntimeError("re-submit; image was already delivered")
        if 500 <= e.code < 600 and err.get("code") == "all_providers_failed":
            raise RuntimeError("upstream — try fallback model or retry with jitter")
        raise RuntimeError(f"{e.code} {err.get('type')}/{err.get('code')}: {err.get('message')} [req={err.get('request_id')}]")

Retry-After is sent as integer seconds. Always combine an explicit retry-after read with exponential backoff (+ jitter) as the fallback when the header is missing.

Health dashboard: GET /api-account/health/summary (returns {"data": {...}} envelope) and /health/routes show live availability, p50/p95 latency, and incident routes — wire this into your runbook.

Cost tracking & budgets

Per-request cost: every chat response includes a usage block:

"usage": {
  "prompt_tokens": 13,
  "completion_tokens": 4,
  "total_tokens": 17,
  "prompt_tokens_details": { "cached_tokens": 0, "cache_creation_input_tokens": 0, "audio_tokens": 0 },
  "completion_tokens_details": { "reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 }
}

Capture usage from a streaming response: pass stream_options: {"include_usage": true}. The final data: chunk before data: [DONE] will have choices: [] and the populated usage. Without this option, usage is null on every streamed chunk.

stream = client.chat.completions.create(
    model="gpt-5.4-nano", messages=[...], stream=True,
    stream_options={"include_usage": True},
)
usage = None
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if chunk.usage is not None:
        usage = chunk.usage   # only present on the terminal chunk

Per-request cost telemetry (vendor extension): AllToken also emits one extra SSE comment line after data: [DONE] with a fiat-priced breakdown:

: {"cost":"0.0000188000","input_price":"0.0002000000","output_price":"0.0012500000","prompt_tokens":19,"completion_tokens":12}

Standard OpenAI SDKs drop comment lines (lines beginning with :), so this is invisible when using openai. To capture it, parse the raw SSE stream yourself and do not stop on [DONE]:

# stdlib-only — captures both usage (from data: chunks) AND cost comment (post-DONE)
import urllib.request, json
req = urllib.request.Request(URL, data=BODY, method="POST", headers=H)
r = urllib.request.urlopen(req)
saw_done = False
for raw in iter(r.readline, b""):
    line = raw.decode().rstrip("\n")
    if line.startswith(":"):
        cost = json.loads(line[1:])      # {"cost": "...", ...}
    elif line.startswith("data: "):
        data = line[6:]
        if data == "[DONE]": saw_done = True; continue
        # ... parse chunk

Other useful response metadata: chat responses also carry top-level service_tier (e.g. "default") and x-gateway-request-id (use this when filing support tickets).

Account-wide totals: /api-account/user/{balance,billing,usage,billing/orders,...} exist but are not callable with the API key — they need the web-session token. Check balance and history in Settings → Billing on https://alltoken.ai, or top up via the same dashboard.

Extending the Agent

Custom hooks

const agent = createAgent({ model: 'minimax-m2.7' });

agent.on('message:user',      (m) => db.insert('user', m.content));
agent.on('message:assistant', (m) => db.insert('assistant', m.content));
agent.on('tool:call',         (name, args) => analytics.track('tool', { name, args }));
agent.on('error',             (err) => sentry.capture(err));

HTTP server (one agent per session)

import express from 'express';
import { createAgent, type Agent } from './agent.js';

const app = express(); app.use(express.json());
const sessions = new Map<string, Agent>();

app.post('/chat', async (req, res) => {
  const { sessionId, message } = req.body;
  let agent = sessions.get(sessionId);
  if (!agent) { agent = createAgent(); sessions.set(sessionId, agent); }
  res.json({ response: await agent.sendSync(message), history: agent.getMessages() });
});

app.listen(3000);

Agent API Reference

`createAgent(config)`

Option	Type	Default	Description
`apiKey`	string	`process.env.ALLTOKEN_API_KEY`	AllToken API key
`model`	string	`'minimax-m2.7'`	Model ID (see model discovery)
`instructions`	string	`'You are a helpful assistant.'`	System prompt
`tools`	`ToolHandler[]`	`[]`	Function-calling tools
`maxSteps`	number	`5`	Max tool-loop iterations
`temperature`	number	`0.7`	Sampling temperature 0–2
`enableSearch`	boolean	`false`	AllToken `enable_search` extension

Methods

Method	Returns	Description
`send(content)`	`Promise<string>`	Streaming send + tool loop
`sendSync(content)`	`Promise<string>`	Non-streaming send
`getMessages()`	`Message[]`	Full conversation
`clearHistory()`	`void`	Reset (keeps system prompt)
`setInstructions()`	`void`	Update system prompt
`addTool(tool)`	`void`	Register tool at runtime

Events

Event	Payload	Notes
`message:user`	`Message`
`message:assistant`	`Message`	Final turn (post tool loop)
`stream:start`	—
`stream:delta`	`(delta, accumulated)`	OpenAI-style token chunks
`stream:end`	`fullText`
`tool:call`	`(name, args, callId)`
`tool:result`	`(name, result, callId)`
`thinking:start`	—
`thinking:end`	—
`error`	`Error`

Resources

Core API

API overview: https://alltoken.ai/docs/apis/overview
Chat completions: https://alltoken.ai/docs/apis/completions
Image generation: https://alltoken.ai/docs/apis/image
Video generation: https://alltoken.ai/docs/apis/video
Models / Providers / Health / Rankings / Keys / Billing: https://alltoken.ai/docs/apis/{models,providers,health,rankings,keys,billing}
Interactive API explorer: https://alltoken.ai/docs/apis/interactive

Guides (one topic per page)

Quickstart: https://alltoken.ai/docs/guides/quickstart
Authentication: https://alltoken.ai/docs/guides/authentication
Models: https://alltoken.ai/docs/guides/models
Multimodal: https://alltoken.ai/docs/guides/multimodal
Streaming: https://alltoken.ai/docs/guides/streaming
Function Calling: https://alltoken.ai/docs/guides/function-calling
Thinking Mode: https://alltoken.ai/docs/guides/thinking-mode
Web Search: https://alltoken.ai/docs/guides/web-search
Video Generation: https://alltoken.ai/docs/guides/video-generation
Model Routing: https://alltoken.ai/docs/guides/model-routing
Model Fallbacks: https://alltoken.ai/docs/guides/model-fallbacks
Provider Selection: https://alltoken.ai/docs/guides/provider-selection
Rate Limits: https://alltoken.ai/docs/guides/rate-limits
Cost Tracking: https://alltoken.ai/docs/guides/cost-tracking

Live endpoints (callable now)

OpenAI-compatible model list: GET https://api.alltoken.ai/v1/models (Bearer)
Public catalog (no auth): GET https://api.alltoken.ai/api-account/models
Public health: GET https://api.alltoken.ai/api-account/health/summary

Account management: Settings → API Keys / Billing on https://alltoken.ai (web session required; not callable with your Bearer API key)

SDKs

AllToken SDK overview: https://alltoken.ai/docs/sdks/overview
OpenAI SDK (TypeScript): https://github.com/openai/openai-node
OpenAI SDK (Python): https://github.com/openai/openai-python
Ink (terminal UI): https://github.com/vadimdemedes/ink