{"skill":{"slug":"novita-multimodal","displayName":"Novita AI Multimodal","summary":"Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating vide...","description":"---\nname: novita-multimodal\ndescription: |\n  Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT.\n  Use for: generating images, generating videos, text-to-speech, speech recognition.\n---\n\n# Novita AI Multimodal Execution\n\n## Configuration (choose one, by priority)\n\n### Method 1: Config File (Recommended)\n\nCreate file `~/.novita/config.json`:\n\n```json\n{\n  \"api_key\": \"YOUR_API_KEY\"\n}\n```\n\n**One command setup:**\n```bash\nmkdir -p ~/.novita && echo '{\"api_key\": \"YOUR_API_KEY\"}' > ~/.novita/config.json\n```\n\n### Method 2: Environment Variable\n\n```bash\nexport NOVITA_API_KEY=\"YOUR_API_KEY\"\n```\n\n### Method 3: Direct Parameter\n\nInclude in request: `Please use API Key sk_xxx to generate an image...`\n\n---\n\n## API Key Reading Logic\n\n```\n1. Check if user message contains API Key (starts with sk_)\n2. Check config file ~/.novita/config.json\n3. Check environment variable NOVITA_API_KEY\n4. None found → Return configuration guide\n```\n\n**Configuration guide (only shown when not configured):**\n\n```\nYou have not configured your Novita AI API Key.\n\nQuick setup (copy and run):\nmkdir -p ~/.novita && echo '{\"api_key\": \"YOUR_KEY\"}' > ~/.novita/config.json\n\nGet Key: https://novita.ai/settings/key-management\n```\n\n---\n\n## Execution Flow (Important!)\n\n```\nUser request → Identify task → Get Key → ⚠️ Send prompt first → Execute task → Return result\n```\n\n### ⚠️ Must Send Progress Prompt First\n\n**Before calling the API, you must reply to the user with a message:**\n\n```\n🎨 Got it! Generating your image...\n\nTask type: Text-to-Image\nModel: Seedream 5.0 Lite\nEstimated time: 5-15 seconds\nEstimated cost: ~$0.035\n\nPlease wait, will send as soon as it's ready ⏳\n```\n\n**This message must be sent BEFORE executing the API call!** This way users know the task is being processed and won't think the system is stuck.\n\n### Progress Templates for Different Tasks\n\n**Text-to-Image:**\n```\n🎨 Got it! Generating your image...\nModel: Seedream 5.0 Lite\nEstimated time: 5-15 seconds\n```\n\n**Text-to-Video:**\n```\n🎬 Got it! Generating your video...\nModel: Vidu Q3 Pro\nEstimated time: 1-3 minutes (video generation is slower, please be patient)\n```\n\n**TTS:**\n```\n🔊 Got it! Generating your audio...\nModel: MiniMax Speech 2.8 Turbo\nEstimated time: 5-15 seconds\n```\n\n### Completion Response\n\n```\n✅ Generation complete!\n\n[Image/Video/Audio URL]\n\nActual cost: $0.035\n```\n\n### Video Task Polling Updates\n\nVideo generation requires polling, update status every 15 seconds:\n\n```\n🎬 Video generating...\nCurrent status: Processing\nElapsed: 30 seconds\nEstimated remaining: 1-2 minutes\n```\n\n---\n\n## API Configuration\n\n| Setting | Value |\n|---------|-------|\n| Base URL | `https://api.novita.ai` |\n| Auth | `Authorization: Bearer <API_KEY>` |\n| Get Key | https://novita.ai/settings/key-management |\n\n## Task Types and Endpoints\n\n| Task | Endpoint | Model |\n|------|----------|-------|\n| Text-to-Image | `/v3/seedream-5.0-lite` | Seedream 5.0 Lite |\n| Image Editing | `/v3/seedream-5.0-lite` | Seedream 5.0 Lite |\n| Text-to-Video | `/v3/async/vidu-q3-pro-t2v` | Vidu Q3 Pro |\n| Image-to-Video | `/v3/async/vidu-q3-pro-i2v` | Vidu Q3 Pro |\n| TTS | `/v3/async/minimax-speech-2.8-turbo` | MiniMax Speech 2.8 |\n| STT | `/v3/glm-asr` | GLM ASR |\n| Task Query | `/v3/async/task-result?task_id=xxx` | - |\n\n---\n\n## Execution Templates\n\n### Text-to-Image\n\n```bash\ncurl -X POST \"https://api.novita.ai/v3/seedream-5.0-lite\" \\\n  -H \"Authorization: Bearer $API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"prompt\": \"description\"}'\n```\n\n### Image Editing\n\n```bash\ncurl -X POST \"https://api.novita.ai/v3/seedream-5.0-lite\" \\\n  -H \"Authorization: Bearer $API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"prompt\": \"edit instruction\", \"reference_images\": [\"image_url\"]}'\n```\n\n### Text-to-Video\n\n```bash\ncurl -X POST \"https://api.novita.ai/v3/async/vidu-q3-pro-t2v\" \\\n  -H \"Authorization: Bearer $API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"prompt\": \"description\", \"duration\": 4}'\n```\n\n### Image-to-Video\n\n```bash\ncurl -X POST \"https://api.novita.ai/v3/async/vidu-q3-pro-i2v\" \\\n  -H \"Authorization: Bearer $API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"prompt\": \"motion description\", \"images\": [\"image_url\"]}'\n```\n\n### TTS\n\n```bash\ncurl -X POST \"https://api.novita.ai/v3/async/minimax-speech-2.8-turbo\" \\\n  -H \"Authorization: Bearer $API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"text to convert\",\n    \"voice_setting\": {\"voice_id\": \"male-qn-qingse\", \"speed\": 1.0},\n    \"audio_setting\": {\"format\": \"mp3\"}\n  }'\n```\n\n**Available voices:**\n- Male: `male-qn-qingse`, `male-qn-jingying`\n- Female: `female-shaonv`, `female-yujie`\n\n### STT\n\n```bash\ncurl -X POST \"https://api.novita.ai/v3/glm-asr\" \\\n  -H \"Authorization: Bearer $API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"file\": \"audio_url_or_base64\"}'\n```\n\n### Task Result Query\n\n```bash\ncurl \"https://api.novita.ai/v3/async/task-result?task_id=$TASK_ID\" \\\n  -H \"Authorization: Bearer $API_KEY\"\n```\n\n**Status:** `TASK_STATUS_QUEUED` → `TASK_STATUS_PROCESSING` → `TASK_STATUS_SUCCEED`\n\n---\n\n## Error Handling\n\n| Code | Meaning | Action |\n|------|---------|--------|\n| 401 | Invalid Key | Check configuration |\n| 402 | Insufficient balance | Top up at https://novita.ai/billing |\n| 429 | Rate limited | Wait and retry |\n\n## Pricing\n\nhttps://novita.ai/pricing\n","topics":["Text-to-Speech"],"tags":{"latest":"0.2.0"},"stats":{"comments":0,"downloads":532,"installsAllTime":20,"installsCurrent":1,"stars":1,"versions":1},"createdAt":1773425520222,"updatedAt":1778491890041},"latestVersion":{"version":"0.2.0","createdAt":1773425520222,"changelog":"- Major update: Expanded documentation and clarified configuration and execution flow for multimodal Novita AI tasks.\n- Added detailed step-by-step guides for setup via config file, environment variable, or direct parameter.\n- Introduced clear API key reading logic and user-facing configuration guidance.\n- Specified required progress prompts and response templates for each task type (image, video, TTS, STT).\n- Included sample API requests for all supported endpoints and detailed polling instructions for video generation.\n- Enhanced error handling documentation and added links to official pricing and key management pages.","license":"MIT-0"},"metadata":null,"owner":{"handle":"ximasadila","userId":"s17a1yve62qf41hcw7n8zs0w198425k8","displayName":"bbear","image":"https://avatars.githubusercontent.com/u/133744881?v=4"},"moderation":null}