Youtube Video Caption Generator

PassAudited by ClawScan on Apr 30, 2026.

Overview

This instruction-only skill is mainly a disclosed cloud video-captioning workflow, but users should understand that videos, prompts, and a service token are sent to a third-party backend.

Before installing, be comfortable with sending your video files, URLs, and editing prompts to the NemoVideo cloud backend. Protect NEMO_TOKEN like a password, and use the skill only with media you intend to process through that third-party service.

Findings (7)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI01: Agent Goal Hijack

What this means

The remote service can guide the agent's next editing or export steps within the video workflow.

Why it was flagged

The skill tells the agent to translate backend-provided GUI-style messages into API actions. This is part of the intended workflow, but it makes backend text operationally influential.

Skill content

Backend Response Translation ... "click [button]" / "点击" | Execute via API

Recommendation

Use the skill for its intended video-captioning workflow and review important actions such as uploads and exports.

Low

#ASI02: Tool Misuse and Exploitation

What this means

Files or URLs you provide may be uploaded to the NemoVideo backend and processed in the cloud.

Why it was flagged

The skill uses network API operations to upload media, send messages, render, poll status, and download results. These operations are central to the stated caption-generation purpose.

Skill content

Upload: POST `/api/upload-video/nemo_agent/me/<sid>` ... Export ... POST `/api/render/proxy/lambda`

Recommendation

Only provide videos or URLs you intend to send to the service, and confirm that the rendered output is what you expect.

Low

#ASI03: Identity and Privilege Abuse

What this means

The backend session and requests are authorized using NEMO_TOKEN or an anonymous token created for this service.

Why it was flagged

The skill requires a service token or creates an anonymous starter token for the backend. This is expected for the integrated rendering service, and no hardcoded token or unrelated credential use is shown.

Skill content

If `NEMO_TOKEN` is in the environment, use it directly ... Otherwise, acquire a free starter token ... Include `Authorization: Bearer <NEMO_TOKEN>`

Recommendation

Treat NEMO_TOKEN as a credential, avoid sharing it, and revoke or rotate it if you no longer trust the service.

Info

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Users have limited registry-provided information for independently verifying the publisher or service provider.

Why it was flagged

The package has limited provenance information. There is no installable code in the artifact, but the skill depends on a named external backend service.

Skill content

Source: unknown; Homepage: none

Recommendation

Verify that you are comfortable using the NemoVideo backend before uploading private or unreleased videos.

Low

#ASI06: Memory and Context Poisoning

What this means

The service may retain state about your video project during the session, and that state can influence later export or status actions.

Why it was flagged

The skill retrieves and relies on backend session state for the video draft and generated media. This is expected for cloud rendering, but it is persistent task context.

Skill content

Session state: GET `/api/state/nemo_agent/me/<sid>/latest` — key fields: `data.state.draft`, `data.state.video_infos`, `data.state.generated_media`

Recommendation

Keep projects and sessions scoped to the videos you intend to process, especially for private media.

Low

#ASI07: Insecure Inter-Agent Communication

What this means

Your prompts and project context are sent to the backend agent that performs the captioning/editing workflow.

Why it was flagged

The skill communicates with an external backend agent over SSE using a bearer token and session ID. This is disclosed and purpose-aligned, but it is an inter-service agent workflow.

Skill content

Send message (SSE): POST `/run_sse` — body `{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>"...}` with `Accept: text/event-stream`

Recommendation

Avoid sending confidential video content or prompts unless you are comfortable with the backend service handling them.

Low

#ASI09: Human-Agent Trust Exploitation

What this means

The chat may not display every technical backend step, even though a third-party session is being established.

Why it was flagged

The skill instructs the agent not to show connection details during setup. This may keep the conversation simple, but users should still understand that a backend token/session is being created and used.

Skill content

Tell the user you're ready. Keep the technical details out of the chat.

Recommendation

The skill should keep user-facing explanations concise while still being transparent that files are processed by a cloud backend.