Security audit

StepFun step-audio-r1.1

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed StepFun audio-chat helper that sends user-selected text or audio to StepFun and saves the response locally.

Install only if you are comfortable sending selected prompts, system instructions, and audio files to StepFun under your API key. Keep saved transcripts, audio, and response JSON in a private output directory, delete them when no longer needed, and avoid overriding the API base URL unless you trust the endpoint.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Description-Behavior Mismatch

Medium

Confidence: 72% confidence
Finding: The skill exposes a voice-listing feature that is outside the stated purpose of sending a speech turn and saving the response. While it does not by itself compromise the host, it broadens the capability surface and allows account metadata enumeration that a user may not expect from the manifest description.

Context-Inappropriate Capability

Medium

Confidence: 84% confidence
Finding: This code enumerates custom/cloned voices tied to the account and prints identifiers and metadata such as file_id and created_at. That is sensitive account inventory information unrelated to the core chat/send-and-save workflow, so exposing it increases information disclosure risk if the skill is invoked in broader agent contexts.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The notes explicitly describe sending chat content and optional audio to a third-party API but do not include any user-facing disclosure, consent, or privacy guidance. In a skill handling speech and transcripts, this omission can cause users or integrators to unknowingly transmit sensitive personal or confidential data off-platform.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script sends prompts and optional local audio content to an external StepFun API, but there is no explicit user-facing warning or confirmation before transmitting potentially sensitive text or voice data. In a skill context, silent exfiltration of user-provided content to a third party is a meaningful privacy and data-handling concern even if it is functionally intended.

External Transmission

Medium

Category: Data Exfiltration
Content: ## Endpoint - Method: `POST` - URL: `https://api.stepfun.com/v1/chat/completions` ## Request Fields Used By This Skill
Confidence: 80% confidence
Finding: https://api.stepfun.com/

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal