Gemini STT

AdvisoryAudited by Static analysis on Apr 30, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If the agent is tricked into using an unsafe Vertex region value, a Google Cloud access token and the audio being transcribed could be sent to a non-Google host.

Why it was flagged

The user-controlled region value is inserted into the URL host while a Google bearer token is attached. A malicious or manipulated region containing a slash could change the request destination and receive the token and audio payload.

Skill content
url = f"https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:generateContent"
...
"Authorization": f"Bearer {access_token}",
Recommendation

Validate region against an allowlist or strict region-name regex before constructing the URL, and refuse values containing URL metacharacters such as '/', ':', '@', '?', or '#'. Require confirmation for non-default endpoints.

What this means

The skill may use your Google API key or active Google Cloud account/project, which can affect billing and access permissions.

Why it was flagged

The skill uses either a Gemini API key from the environment or a local gcloud access token. This is expected for the stated Google transcription integration, but it is sensitive credential use.

Skill content
api_key = os.environ.get("GEMINI_API_KEY")
...
["gcloud", "auth", "print-access-token"]
Recommendation

Use a least-privileged API key or Google Cloud project, verify the active gcloud account and project before running, and declare these credential requirements in the skill metadata.

What this means

Audio contents you transcribe are sent to Google for processing.

Why it was flagged

The selected local audio file is read, base64-encoded, and included in a request to Google's Gemini or Vertex AI API. This is purpose-aligned but crosses a privacy boundary.

Skill content
with open(file_path, "rb") as f:
    audio_data = f.read()
...
{"inline_data": {"mime_type": mime_type, "data": b64_data}}
Recommendation

Only use this skill for audio you are comfortable sending to Google, and review Google's Gemini or Vertex AI data handling terms for your account type.