Kokoro Agent Voices

Security checks across malware telemetry and agentic risk

Overview

The skill appears to provide text-to-speech playback, but its playback helper can turn a crafted output filename into a local shell command.

Review before installing or using. Avoid the --play option with any untrusted or unusual output path until playback is changed to use a safe subprocess argument list, such as passing afplay and the output filename as separate arguments.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (2)

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
out = speak(args.text, voice=voice, speed=speed, output=args.output)

    if args.play and out:
        os.system(f"afplay {out}")

if __name__ == "__main__":
    main()
Confidence
98% confidence
Finding
os.system(f"afplay {out}")

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
This finding points to the same risky behavior as the os.system call: local shell execution for playback. In context, playback is a plausible feature for a TTS tool, but implementing it through shell execution with unsanitized input makes it dangerous because it creates an avoidable command-injection path.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal