Install
openclaw skills install martin-mediaUse the SkillBoss API Hub (image generation, video generation, TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
openclaw skills install martin-mediaThis Skill consolidates six multimodal media capabilities into reusable workflows and implementation templates, all routed through SkillBoss API Hub (https://api.skillbossai.com/v1/pilot):
Convention: All API calls go through SkillBoss API Hub
/v1/pilot, which automatically routes to the optimal underlying model. Authentication uses a singleSKILLBOSS_API_KEY.
fetch (built-in to Node 18+):# No extra install needed for Node.js 18+
# For older environments you can use: npm install node-fetch
SKILLBOSS_API_KEYAuthorization: Bearer $SKILLBOSS_API_KEYAll examples below use this shared pilot() helper:
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
result.image_url (URL) or result.images[0].url in the response.result.audio_url.result.video_url; long-running tasks may require polling.SkillBoss API Hub
/v1/pilotautomatically routes to the optimal underlying model. Usepreferto control the trade-off:
"quality"— best output quality"price"— lowest cost"balanced"— balanced quality/cost (default)
No need to specify model names manually. The hub selects the best available model for the requested capability.
Node.js minimal template
import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "image",
inputs: {
prompt: "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
},
prefer: "quality",
});
const imageUrl = result["result"]["image_url"];
console.log("Image URL:", imageUrl);
// Download and save the image
const imgResponse = await fetch(imageUrl);
const buffer = Buffer.from(await imgResponse.arrayBuffer());
fs.writeFileSync("out.png", buffer);
REST (curl) minimal template
curl -s -X POST "https://api.skillbossai.com/v1/pilot" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "image",
"inputs": {
"prompt": "Create a picture of a nano banana dish in a fancy restaurant",
"aspect_ratio": "16:9"
},
"prefer": "quality"
}'
# Image URL is at: .result.image_url
Use case: given an image, add/remove/modify elements, change style, color grading, etc.
Node.js minimal template
import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const imageBase64 = fs.readFileSync("input.png").toString("base64");
const result = await pilot({
type: "image",
inputs: {
prompt: "Add a nano banana on the table, keep lighting consistent, cinematic tone.",
image_data: imageBase64,
image_mime_type: "image/png",
},
prefer: "quality",
});
const imageUrl = result["result"]["image_url"];
const imgResponse = await fetch(imageUrl);
const buffer = Buffer.from(await imgResponse.arrayBuffer());
fs.writeFileSync("edited.png", buffer);
Best practice: use multiple sequential calls with the previous output fed back as image_data for continuous iteration (e.g., generate first, then "only edit a specific region/element", then "make variants in the same style").
Pass these in the inputs object:
aspect_ratio: e.g. "16:9", "1:1"size: e.g. "1024x1024", "1024x576" (16:9)import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const imageBase64 = fs.readFileSync("image.jpg").toString("base64");
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: `data:image/jpeg;base64,${imageBase64}` },
},
{
type: "text",
text: "Caption this image, and list any visible brands.",
},
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: "https://example.com/image.jpg" },
},
{ type: "text", text: "Caption this image." },
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
Append multiple images as multiple entries in the content array; you can mix URLs and inline Base64 bytes.
inputs.import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "video",
inputs: {
prompt: "A cinematic shot of a cat astronaut walking on the moon. Include subtle wind ambience.",
duration: 8,
aspect_ratio: "16:9",
resolution: "1080p",
},
prefer: "quality",
});
const videoUrl = result["result"]["video_url"];
console.log("Video URL:", videoUrl);
// Download and save
const videoResponse = await fetch(videoUrl);
const buffer = Buffer.from(await videoResponse.arrayBuffer());
fs.writeFileSync("out.mp4", buffer);
Pass these in the inputs object:
aspect_ratio: "16:9" or "9:16"resolution: "720p" | "1080p" | "4k"duration: duration in seconds (default 8)Retry with timeout pseudocode
const deadline = Date.now() + 300_000; // 5 min
let result = null;
while (Date.now() < deadline) {
try {
result = await pilot({
type: "video",
inputs: { prompt: "...", duration: 8 },
prefer: "quality",
});
if (result["result"]["video_url"]) break;
} catch (e) {
await new Promise((resolve) => setTimeout(resolve, 5000));
}
}
if (!result) throw new Error("video generation timed out");
const videoUrl = result["result"]["video_url"];
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "video_url",
video_url: { url: "https://example.com/sample.mp4" },
},
{
type: "text",
text: "Summarize this video. Provide timestamps for key events.",
},
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
import * as fs from "node:fs";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const result = await pilot({
type: "tts",
inputs: {
text: "Say cheerfully: Have a wonderful day!",
voice: "Kore",
},
prefer: "balanced",
});
const audioUrl = result["result"]["audio_url"];
console.log("Audio URL:", audioUrl);
// Download and save
const audioResponse = await fetch(audioUrl);
const buffer = Buffer.from(await audioResponse.arrayBuffer());
fs.writeFileSync("out.mp3", buffer);
Pass multiple text segments with speaker labels in the text field, using a structured format like "[Speaker1]: Hello\n[Speaker2]: Hi there".
voice field supports named voices (e.g., "alloy", "Kore", "Zephyr", "Puck").Prefix the text with style directions, e.g.: "Speak in a calm, professional tone: [your content here]".
import * as fs from "node:fs";
import { Buffer } from "node:buffer";
const SKILLBOSS_API_KEY = process.env.SKILLBOSS_API_KEY;
const API_BASE = "https://api.skillbossai.com/v1";
async function pilot(body) {
const r = await fetch(`${API_BASE}/pilot`, {
method: "POST",
headers: {
"Authorization": `Bearer ${SKILLBOSS_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
return r.json();
}
const audioB64 = fs.readFileSync("sample.mp3").toString("base64");
const result = await pilot({
type: "stt",
inputs: {
audio_data: audioB64,
filename: "sample.mp3",
},
});
const transcript = result["result"]["text"];
console.log(transcript);
const audioB64 = fs.readFileSync("sample.mp3").toString("base64");
const result = await pilot({
type: "chat",
inputs: {
messages: [
{
role: "user",
content: [
{
type: "audio_url",
audio_url: { url: `data:audio/mp3;base64,${audioB64}` },
},
{ type: "text", text: "Describe this audio clip." },
],
},
],
},
prefer: "balanced",
});
const text = result["result"]["choices"][0]["message"]["content"];
console.log(text);
type: "image" (specify negative space and consistent lighting in the prompt).type: "chat" with image understanding for self-check: verify text clarity, brand spelling, and unsafe elements.type: "video" (include dialogue or SFX in the prompt).type: "chat" with video to produce a storyboard + timestamps + narration copy; then feed the copy to type: "tts".type: "stt".type: "chat" to summarize or extract specific time ranges.type: "tts" to generate a "broadcast" version of the summary.SKILLBOSS_API_KEY environment variable.type: "image" for image generation, "chat" for understanding tasks, "video" for video generation, "tts" for speech, "stt" for transcription.prefer: "quality" for best results, "balanced" for cost efficiency.result.image_url; audio → result.audio_url; video → result.video_url; chat → result.choices[0].message.content; stt → result.text.aspect_ratio / resolution, and download promptly.voice name in inputs; use director-style prefix for tone control.