audio-transcribe

Security checks across malware telemetry and agentic risk

Overview

This skill does relevant transcription work, but users should review it because sensitive media and transcripts default to a SkillBoss/HeyBossAI API path while the documentation presents the workflow as AssemblyAI-focused.

Install only if you trust the SkillBoss/HeyBossAI service path with your audio, video, transcripts, prompts, schemas, and API key. Test with non-sensitive content first, avoid confidential or regulated recordings unless approved, review where bundles are written, and do not allow an agent to run the delete command unless you explicitly intend to remove that remote transcript.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (9)

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The skill is presented as using AssemblyAI specifically, but its configured defaults route transcription and LLM operations to the SkillBoss API Hub. This creates a material trust-boundary mismatch: users may provide sensitive audio/transcripts believing they go to AssemblyAI, while the data is actually sent to a different third party.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: In the transcribe path, the code builds a rich transcript configuration from user flags, but createTranscript ultimately sends only `{ type: 'stt', inputs: sttInputs }` to `/pilot` and ignores the assembled config. This means diarisation, language settings, PII redaction, webhook controls, and understanding options may be silently dropped, causing users to rely on protections or processing that never actually occurs.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The understand flow claims transcript-aware speech understanding, but it actually sends a generic chat prompt containing the transcript ID and task description to a remote LLM endpoint. If the backend cannot securely resolve that ID in a controlled way, results may be fabricated, inconsistent, or based on unintended data access patterns while users believe they are invoking a specialized AssemblyAI capability.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill explicitly supports uploading local audio, sending remote URLs to AssemblyAI, and using an LLM Gateway for downstream processing, but it does not clearly warn that potentially sensitive audio, video, transcripts, and derived structured data will leave the local environment and be processed by third-party services. In an agent context, this omission increases the risk of accidental disclosure of confidential meetings, personal data, or regulated content because users or calling agents may invoke the skill without understanding the privacy and data-governance implications.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The documentation explicitly states that transcript text is sent to AssemblyAI's remote LLM Gateway, but it does not pair that behavior with a clear user-facing privacy warning, consent requirement, or guidance about sensitive data handling. Because transcripts often contain PII, confidential business discussions, or regulated content, users or downstream agents may transmit sensitive material to a third-party model endpoint without realizing the privacy and compliance implications.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The workflow documentation instructs agents to send local audio files, URLs, transcripts, and transcript IDs to AssemblyAI transcription and LLM-processing commands without any privacy, consent, retention, or data-handling warnings. In an agent setting, this can lead to inadvertent transmission of sensitive conversations, PII, regulated data, or confidential business material to a third-party service and to downstream LLM processing, especially because the document labels one flow as the safest default.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The speech-understanding request sends transcript identifiers and requested task details to a remote LLM service without an explicit warning at the point of use. In a transcription context, transcript IDs and associated tasks can reveal sensitive meeting, medical, legal, or customer-service information, so undisclosed remote transmission is a meaningful privacy risk.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The llm command packages transcript content into a chat request and transmits it to the remote LLM endpoint, but there is no explicit privacy warning or consent gate near that operation. Because transcripts often contain sensitive spoken content, silent onward transfer to another service increases confidentiality and compliance risk.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The delete command performs irreversible remote deletion immediately when invoked, without a confirmation requirement or force flag. This makes accidental data loss much more likely, especially in agentic or scripted environments where command arguments may be generated automatically.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal