IMA AI Text To Speech — seed-tts, DouBao

Security checks across malware telemetry and agentic risk

Overview

This is a coherent cloud text-to-speech skill, with disclosed API use and limited local preference/log storage.

Install only if you trust IMA Studio with the text you synthesize and with an API key that can create paid or credit-consuming TTS jobs. Avoid sending secrets or sensitive documents, keep the default provider endpoint unless you intentionally override it, and delete ~/.openclaw/memory/ima_prefs.json or ~/.openclaw/logs/ima_skills/ if you do not want local preference or operation history retained.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The documented preference memory persists per-user speaker and model choices in a local file without explaining consent, retention, access controls, or necessity boundaries. While preference storage can support UX, this exceeds one-shot TTS generation and creates privacy and data-minimization risk if users are unaware their choices are being stored across sessions.

Context-Inappropriate Capability

Medium

Confidence: 85% confidence
Finding: The skill introduces a user memory capability not clearly justified by the core declared purpose of converting text to speech. Hidden or under-documented persistence increases the risk of collecting behavioral preference data beyond user expectations, which is especially problematic in agent environments where users may assume stateless operation.

Description-Behavior Mismatch

Low

Confidence: 82% confidence
Finding: The script persists per-user model preference data to ~/.openclaw/memory/ima_prefs.json even though the skill's stated purpose is TTS generation, not local profiling or retention. While the stored fields are limited, this still expands data retention and creates privacy exposure on shared systems because user IDs, model choices, and timestamps are written to disk without clear necessity or consent.

Context-Inappropriate Capability

Low

Confidence: 80% confidence
Finding: The local preference memory is broader than what is required to convert text to speech and introduces unnecessary statefulness. On multi-user or managed environments, this can expose usage metadata and create compliance or privacy concerns even if the data is not highly sensitive.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The keyword list includes very generic phrases such as 'read aloud', 'read out loud', 'text reader', and common voice-control terms that can match ordinary user requests unrelated to this specific TTS skill. This can cause the skill to activate unintentionally or overshadow more appropriate skills, increasing the chance of misrouting user input and surprising behavior.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill sends user-provided text to a third-party API service but does not clearly warn users before collection or transmission. This creates a privacy and consent risk, especially if users submit sensitive scripts, captions, internal documents, or personal information assuming processing is local.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal