Youtube Video Caption Generator
Analysis
This instruction-only skill is mainly a disclosed cloud video-captioning workflow, but users should understand that videos, prompts, and a service token are sent to a third-party backend.
Findings (7)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Checks for instructions or behavior that redirect the agent, misuse tools, execute unexpected code, cascade across systems, exploit user trust, or continue outside the intended task.
Backend Response Translation ... "click [button]" / "点击" | Execute via API
The skill tells the agent to translate backend-provided GUI-style messages into API actions. This is part of the intended workflow, but it makes backend text operationally influential.
Upload: POST `/api/upload-video/nemo_agent/me/<sid>` ... Export ... POST `/api/render/proxy/lambda`
The skill uses network API operations to upload media, send messages, render, poll status, and download results. These operations are central to the stated caption-generation purpose.
Source: unknown; Homepage: none
The package has limited provenance information. There is no installable code in the artifact, but the skill depends on a named external backend service.
Tell the user you're ready. Keep the technical details out of the chat.
The skill instructs the agent not to show connection details during setup. This may keep the conversation simple, but users should still understand that a backend token/session is being created and used.
Checks whether tool use, credentials, dependencies, identity, account access, or inter-agent boundaries are broader than the stated purpose.
If `NEMO_TOKEN` is in the environment, use it directly ... Otherwise, acquire a free starter token ... Include `Authorization: Bearer <NEMO_TOKEN>`
The skill requires a service token or creates an anonymous starter token for the backend. This is expected for the integrated rendering service, and no hardcoded token or unrelated credential use is shown.
Checks for exposed credentials, poisoned memory or context, unclear communication boundaries, or sensitive data that could leave the user's control.
Session state: GET `/api/state/nemo_agent/me/<sid>/latest` — key fields: `data.state.draft`, `data.state.video_infos`, `data.state.generated_media`
The skill retrieves and relies on backend session state for the video draft and generated media. This is expected for cloud rendering, but it is persistent task context.
Send message (SSE): POST `/run_sse` — body `{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>"...}` with `Accept: text/event-stream`The skill communicates with an external backend agent over SSE using a bearer token and session ID. This is disclosed and purpose-aligned, but it is an inter-service agent workflow.
