Back to skill

Security audit

StepFun step-audio-r1.1

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed StepFun audio-chat helper that sends user-selected text or audio to StepFun and saves the response locally.

Install only if you are comfortable sending selected prompts, system instructions, and audio files to StepFun under your API key. Keep saved transcripts, audio, and response JSON in a private output directory, delete them when no longer needed, and avoid overriding the API base URL unless you trust the endpoint.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (5)

Description-Behavior Mismatch

Medium
Confidence
72% confidence
Finding
The skill exposes a voice-listing feature that is outside the stated purpose of sending a speech turn and saving the response. While it does not by itself compromise the host, it broadens the capability surface and allows account metadata enumeration that a user may not expect from the manifest description.

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
This code enumerates custom/cloned voices tied to the account and prints identifiers and metadata such as file_id and created_at. That is sensitive account inventory information unrelated to the core chat/send-and-save workflow, so exposing it increases information disclosure risk if the skill is invoked in broader agent contexts.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The notes explicitly describe sending chat content and optional audio to a third-party API but do not include any user-facing disclosure, consent, or privacy guidance. In a skill handling speech and transcripts, this omission can cause users or integrators to unknowingly transmit sensitive personal or confidential data off-platform.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The script sends prompts and optional local audio content to an external StepFun API, but there is no explicit user-facing warning or confirmation before transmitting potentially sensitive text or voice data. In a skill context, silent exfiltration of user-provided content to a third party is a meaningful privacy and data-handling concern even if it is functionally intended.

External Transmission

Medium
Category
Data Exfiltration
Content
## Endpoint

- Method: `POST`
- URL: `https://api.stepfun.com/v1/chat/completions`

## Request Fields Used By This Skill
Confidence
80% confidence
Finding
https://api.stepfun.com/

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal