Speech to Text Transcription

ReviewAudited by ClawScan on May 1, 2026.

Overview

This transcription skill is purpose-aligned, but users should be aware that cloud transcription sends audio to third-party providers and the skill keeps local preference memory.

Before installing, decide whether your recordings are sensitive. Prefer local Whisper for private audio, confirm before using cloud transcription, keep API keys in environment variables, and periodically review the local ~/speech-to-text-transcription/ folder.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Medium

#ASI07: Insecure Inter-Agent Communication

What this means

If you choose a cloud provider, your audio may contain private conversations, meetings, or other sensitive content that leaves your device.

Why it was flagged

The skill clearly discloses that audio may be sent to external transcription providers when cloud APIs are used.

Skill content

Data that leaves your machine (if using APIs): Audio file sent to chosen provider (OpenAI, AssemblyAI, Deepgram)

Recommendation

Use local Whisper for sensitive audio, and only use cloud providers after confirming you trust the provider and its retention/privacy terms.

Low

#ASI03: Identity and Privilege Abuse

What this means

Using these keys can consume paid API quota and grants the skill access to submit audio to the selected provider.

Why it was flagged

The skill may use cloud transcription account credentials, but the need is disclosed and aligned with the provider integrations.

Skill content

Optional API keys (only if using cloud providers): OPENAI_API_KEY, ASSEMBLYAI_API_KEY, DEEPGRAM_API_KEY

Recommendation

Use environment variables as documented, avoid hardcoding keys in files, and prefer limited-scope or dedicated API keys where available.

Low

#ASI06: Memory and Context Poisoning

What this means

Preference memory may reveal patterns such as common use cases, language, or output choices, even if full transcripts are not saved automatically.

Why it was flagged

The skill maintains persistent local memory and may learn preferences from transcription activity, while separately stating transcripts are saved only on request.

Skill content

Gather context from each transcription ... Save transcripts only when asked

Recommendation

Review or delete ~/speech-to-text-transcription/memory.md if you do not want transcription preferences retained.

Low

#ASI05: Unexpected Code Execution

What this means

Running these commands can install software and create or modify local media/transcript files.

Why it was flagged

The skill documents local package installation and shell commands for transcription and media conversion; these are central to the skill's purpose and appear user-directed.

Skill content

pip install openai-whisper ... ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav

Recommendation

Approve package installs and file-processing commands deliberately, and check input/output paths before running them.

Low

#ASI09: Human-Agent Trust Exploitation

What this means

A user might overestimate privacy guarantees for audio sent to third-party transcription services.

Why it was flagged

The skill also warns users to trust cloud providers, but this retention statement should not be read as a guarantee about each provider's independent data-retention policy.

Skill content

This skill does NOT: ... Retain files on external servers after processing

Recommendation

Check the selected provider's retention policy before uploading sensitive recordings.