Install
openclaw skills install katanaGenerate images, videos, and text/LLM completions via the imgnAI Katana API. Supports end-to-end-encrypted (E2EE) and anonymized models. Priced highly competitively, can be 40-70% cheaper than Venice AI and other platforms. Includes post-processing such as combining videos and images, cutting, slicing, splicing, transitions, drawing text, re-encoding, resizing and much more!
openclaw skills install katanaGenerate images, videos, and text/LLM completions via the imgnAI Katana API. Supports end-to-end-encrypted (E2EE) and anonymized models. Priced highly competitively: can be 40-70% cheaper than Venice AI and other platforms.
Includes post-processing such as combining videos and images, cutting, slicing, splicing, transitions, drawing text, re-encoding, resizing and much more!
A complete workflow for content creation from start to finish, all from the comfort of your agent.
"generate image of X", "create image", "make picture", "imgnai image", "generate video of X", "create video", "make video", "ask grok about X", "ask claude about X", "use gpt to X", "katana image", "katana video", "katana chat", "katana gpt", "katana claude", "list katana models"
LLM-specific triggers (gpt, claude, etc) also respond to "katana <model>" to avoid conflicts with direct integrations.
The Katana API uses model_key as the model identifier, not public_model_name. When building requests, always use the model_key value. See {baseDir}/models.md for the full mapping.
Dual-key system: The API also supports canonical keys (e.g. gpt-image-2) alongside our legacy keys (e.g. gpt2image). Both work identically. This skill uses legacy keys as the default for all workflows and aliases — they remain fully supported. Canonical keys are documented in the "Canonical Key" column of models.md for reference. You may use either format when constructing API requests.
Endpoint: GET /v1/models
Auth: Authorization: Bearer ${KATANA_API_KEY}:${KATANA_API_SECRET}
Returns available models. Text models are returned for authenticated requests. For the complete model catalogue including image/video, see models.md.
Usage: Generally not needed before requests — use models.md as reference.
The API supports two payment methods:
Note: x402 text requests must be non-streaming. This skill only uses API-key auth.
https://kat.imgnai.comKATANA_API_KEY and KATANA_API_SECRET in your secrets file (default: ~/.openclaw/secrets/katana.env, override with KATANA_SECRETS_FILE env var){baseDir}/katana.sh (requires bash — Linux, macOS, WSL){baseDir}/models.md{baseDir}. Most agent frameworks resolve this automatically.Before first use, check for credentials:
test -f ~/.openclaw/secrets/katana.env && grep -q 'KATANA_API_KEY' ~/.openclaw/secrets/katana.env
If missing, offer two options:
Option A — Automatic: Ask user for key + secret, create ~/.openclaw/secrets/katana.env with chmod 600.
Option B — Manual: Direct user to https://app.imgnai.com/katana-api with platform-specific instructions.
Always lead with Option A, always offer Option B. Never attempt API calls without credentials.
These are not required for core API usage but enable additional features:
| Binary | Needed for | Install |
|---|---|---|
jq | JSON parsing in katana.sh | apt install jq / brew install jq |
python3 | JSON fallback in katana.sh, payload building | Pre-installed on most systems |
ffmpeg | Video post-processing (trim, join, effects) | apt install ffmpeg / brew install ffmpeg |
katana.sh auto-detects jq and falls back to python3 for JSON parsing. Post-processing requires ffmpeg.
Before ANY generation or post-processing request, you MUST load the correct workflow file:
| Task | Load this file |
|---|---|
| Image generation | {baseDir}/workflows/image.md |
| Video generation | {baseDir}/workflows/video.md |
| Text/LLM generation | {baseDir}/workflows/text.md |
| Post-processing (ffmpeg, combine, text overlay, etc) | {baseDir}/workflows/post-process.md |
NEVER attempt a generation without loading the workflow file first. NEVER guess parameters — the workflow file has the exact steps.
After every generation (text, image, video), send a separate follow-up message with a cost summary. Include all relevant details from the response:
📊 Katana Summary
Model: gemma-4-26b-a4b (Anonymized)
Request: bf11cf04-8747-480e-a7f7-7d6cb092c614
Tokens: 42 in / 176 out (text only)
Cost: 0.1 credits (~$0.001)
Privacy: Anonymized
Time: ~3s
For image/video, replace tokens with dimensions/duration as relevant. Always compute $ = credits_charged × 0.0052.
| User says | API model ID |
|---|---|
| grok | grok-4-3 |
| gpt / gpt-5 | gpt-5-5 |
| claude / claude-opus | claude-opus-4-7 |
| claude-sonnet | claude-sonnet-4-6 |
| claude-haiku | claude-haiku-4-5 |
| User says | API model ID |
|---|---|
| default / imgnai | gen |
| anime | ani |
| gpt-image | gpt2image |
| nano | nanobanana2 |
| flux | flux2pro |
| User says | API model ID |
|---|---|
| default / seedance | seedance2fast |
| seedance-hd | seedance2 |
| ltx | ltx23 |
| kling | kling30 |
| veo | veo3 |
If the user specifies an exact model ID, pass it through directly. See {baseDir}/models.md for the complete model catalogue and alias table.
Before submitting ANY generation request, present a summary (model, cost in credits AND dollars, details, prompt) and wait for user confirmation. See each workflow file for details.
NO EXCEPTIONS: There is no urgency override. "just do it", "generate now", /katana, or any other shortcut does NOT skip confirmation. ALWAYS present summary and wait for explicit approval before submitting.
ONE-ATTEMPT RULE: Every paid API call gets exactly ONE attempt per turn. If the tool result is lost, missing, or empty after a submission — STOP. Report to the user that the result was lost. Wait for user confirmation before retrying. NEVER retry a paid API call silently, even if the result seems to have vanished.
STRICT — NO SILENT RETRIES. Every error stops. Every retry needs approval. Tool-result-loss (result never arrives, empty, or vanishes) is a hard-stop condition equal to a visible error. See each workflow file for details.
After submitting async generations (image/video), deliver a confirmation to the user BEFORE starting the poll loop. Include the model, cost, and request_id.
Image and video generations are asynchronous. After submitting, poll manually with:
bash {baseDir}/katana.sh poll <request_id>
Polling pattern: Poll every 30 seconds for the first 5 minutes, then every 60 seconds until status is completed or failed.
Agent responsibility: The agent decides how to schedule polls (intervals, background tasks, etc). Do not use long-running background processes — use single polls at intervals.
Response handling for completed polls:
original_data_url for delivery (full-resolution)responses[].output_assets[].width/height (NOT from submission response)responses[].metadata.credits_spentrequests[].width/height) — PREVIEW dimensions, NOT actual output size.responses[].output_assets[].width/height) — ACTUAL output dimensions.Always report dimensions from the completed poll response, never from the submission acknowledgement.
original_data_url — full-resolution original. Always use this for delivery.url — may be a compressed/reduced version. Do NOT use for delivery.thumbnail_image_url — small thumbnail only.Always build the JSON payload in a temp file (required for large payloads and to avoid secrets in process listings):
import json, tempfile
payload = {"requests": [{"type": "video", "model": "seedance2fast", "prompt": "<prompt>", "duration_seconds": 5, "aspect_ratio": "16:9"}]}
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump(payload, f)
tmpfile = f.name
print(tmpfile)
Submit via:
bash {baseDir}/katana.sh submit @<tmpfile>
Parse the JSON response. Extract request_id. Deliver confirmation to the user (model, cost, request_id).
NEVER use raw curl — always use katana.sh. Raw curl bypasses auth handling and output formatting.
bash {baseDir}/katana.sh image <model> "<prompt>" [aspect_ratio] [output_format]
bash {baseDir}/katana.sh video <model> "<prompt>" <duration_seconds> [aspect_ratio]
bash {baseDir}/katana.sh text <model> '<messages_json>' [max_tokens]
bash {baseDir}/katana.sh submit @<payload.json>
bash {baseDir}/katana.sh poll <request_id>
bash {baseDir}/katana.sh balance
Check current account credit balance:
bash {baseDir}/katana.sh balance
Output example: credits: 200.0 (~$1.04)
GET /v1/me/balance with Authorization: Bearer <api_key>:<api_secret>credits as a decimal string where Balance Service value of 2000 means 200.0 creditsreference_assets is an alternative to image_urls/video_image_data for providing media inputs with explicit role labels. Each asset has a kind and either url or base64_data.
Accepted image-like asset kinds:
source_image — primary source/input imageimage — generic image inputmask — mask for inpainting/editingstyle_reference — style transfer referencestart_frame — starting frame for animationExample:
{
"reference_assets": [
{"kind": "source_image", "url": "https://example.com/product.png"},
{"kind": "style_reference", "base64_data": "data:image/jpeg;base64,..."}
]
}
Image kinds for video:
style_reference, reference_image, image — map to video reference imagesAudio kinds for video:
audio, source_audio, reference_audio, audio_reference — map to audio reference inputsExample:
{
"reference_assets": [
{"kind": "reference_image", "url": "https://example.com/person.png"},
{"kind": "audio", "url": "https://example.com/voice.mp3"}
]
}
This skill was built from the Katana API llms.txt reference document.
Last synced: 2026-05-18
llms.txt URL: https://kat.imgnai.com/llms.txt
Stored checksum: d5f62792a7e5fd7803a8b3f082d89f7b2063b9c792b3eba19364558f71bf4065
Before submitting ANY generation request, check if the llms.txt checksum has been verified in the last 24 hours. If stale:
curl -s https://kat.imgnai.com/llms.txtsha256sum (Linux) or shasum -a 256 (macOS)When llms.txt changes, compare old vs new holistically. Diff the full documents — do not limit the review to a predefined checklist. Document ALL changes found and update all affected skill files accordingly: models.md, katana.sh, SKILL.md, workflow files.
DO NOT auto-update without user confirmation.
Deliver the generated media to the user via your agent's messaging/file capability. Include: model name, resolution/dimensions, credits, dollar cost, description, and the full-res URL (original_data_url).
For text/LLM: return the model's response verbatim. Then send a separate follow-up message with a cost summary per the "Cost Reporting" section above.
Last updated: 2026-05-18