Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Listenhub

v1.0.0

Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video",...

1· 1.5k·12 current·14 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
Name/description (podcast/video/tts/image generation) matches what the scripts implement (endpoints for podcast/flow-speech/storybook and an image generator). However the registry metadata declares no required env vars or config paths while the scripts clearly require and use LISTENHUB_API_KEY and LISTENHUB_OUTPUT_DIR. The skill therefore omits key runtime requirements from its manifest (incoherent).
!
Instruction Scope
SKILL.md instructs the agent to only call the provided scripts (which is fine) but also encourages storing/pulling an API key from shell rc files and using the scripts to auto-save it. The scripts do more than just call an API: they read/write shell rc files, download updates, and may auto-install dependencies. Those actions go beyond pure content generation and are not made explicit in the registry metadata or clearly justified in the description.
!
Install Mechanism
There is no install spec in the registry, but the scripts include a self-update mechanism (lib.sh) that downloads replacements from raw.githubusercontent.com and the GitHub API and atomically replaces local script files. Using GitHub raw content is a commonly used source, but automatic runtime replacement of the skill's code is a notable behavior and increases risk. generate-image.sh also contains logic to auto-install missing tools via brew/apt/yum/choco (and may invoke sudo package installs), which is a high-impact operation if invoked unexpectedly.
!
Credentials
The skill manifest lists no required environment variables, but scripts require LISTENHUB_API_KEY (and optionally LISTENHUB_OUTPUT_DIR). The scripts attempt to load the API key from multiple shell rc files and will write export lines into a shell rc during setup. Requesting and writing a long-lived API key to user shell profiles is disproportionate to an instruction-only manifest and should have been declared explicitly.
!
Persistence & Privilege
The skill does not request platform-wide 'always' privilege, but its lib.sh will automatically download and replace its own scripts when a newer remote version exists (self-updating capability). The scripts also modify user shell rc files to save API keys and may auto-install system packages. Combined, these allow code and configuration changes on the host and therefore represent elevated persistence/privilege compared to a simple command wrapper.
What to consider before installing
This skill's code largely matches its description, but it hides several high-impact behaviors: (1) it needs a LISTENHUB_API_KEY though the registry metadata doesn't declare it; the scripts will read and can write your ~/.zshrc / ~/.bashrc to store that key; (2) the scripts auto-update themselves from GitHub raw URLs at runtime and will replace local files; (3) the image script can auto-install missing packages (brew/apt/yum/choco), which may run privileged package-manager commands. Before installing/use: inspect the scripts yourself (you already have them), and only run them on a non-production machine or in a container/VM; back up your shell rc files; prefer entering the API key manually rather than letting scripts write to your profile; verify you trust the remote domains (api.marswave.ai, api.labnana.com, and the GitHub repo used for updates); if you need stronger assurance, ask the author to (a) declare LISTENHUB_API_KEY in the manifest, (b) make auto-update opt-in, and (c) stop auto-writing to shell rc or make that explicit and user-confirmed.

Like a lobster shell, security has layers — review code before you run it.

latestvk978h633amr5qhnqwb36t8839n81xf6q
1.5kdownloads
1stars
1versions
Updated 7h ago
v1.0.0
MIT-0
<purpose> **The Hook**: Paste content, get audio/video/image. That simple.

Four modes, one entry point:

  • Podcast — Two-person dialogue, ideal for deep discussions
  • Explain — Single narrator + AI visuals, ideal for product intros
  • TTS/Flow Speech — Pure voice reading, ideal for articles
  • Image Generation — AI image creation, ideal for creative visualization

Users don't need to remember APIs, modes, or parameters. Just say what you want. </purpose>

<instructions>

⛔ Hard Constraints (Inviolable)

The scripts are the ONLY interface. Period.

┌─────────────────────────────────────────────────────────┐
│  AI Agent  ──▶  ./scripts/*.sh  ──▶  ListenHub API     │
│                      ▲                                  │
│                      │                                  │
│            This is the ONLY path.                       │
│            Direct API calls are FORBIDDEN.              │
└─────────────────────────────────────────────────────────┘

MUST:

  • Execute functionality ONLY through provided scripts in **/skills/listenhub/scripts/
  • Pass user intent as script arguments exactly as documented
  • Trust script outputs; do not second-guess internal logic

MUST NOT:

  • Write curl commands to ListenHub/Marswave API directly
  • Construct JSON bodies for API calls manually
  • Guess or fabricate speakerIds, endpoints, or API parameters
  • Assume API structure based on patterns or web searches
  • Hallucinate features not exposed by existing scripts

Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.

Script Location

Scripts are located at **/skills/listenhub/scripts/ relative to your working context.

Different AI clients use different dot-directories:

  • Claude Code: .claude/skills/listenhub/scripts/
  • Other clients: may vary (.cursor/, .windsurf/, etc.)

Resolution: Use glob pattern **/skills/listenhub/scripts/*.sh to locate scripts reliably, or resolve from the SKILL.md file's own path.

Private Data (Cannot Be Searched)

The following are internal implementation details that AI cannot reliably know:

CategoryExamplesHow to Obtain
API Base URLapi.marswave.ai/...✗ Cannot — internal to scripts
Endpointspodcast/episodes, etc.✗ Cannot — internal to scripts
Speaker IDscozy-man-english, etc.✓ Call get-speakers.sh
Request schemasJSON body structure✗ Cannot — internal to scripts
Response formatsEpisode ID, status codes✓ Documented per script

Rule: If information is not in this SKILL.md or retrievable via a script (like get-speakers.sh), assume you don't know it.

Design Philosophy

Hide complexity, reveal magic.

Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea → wait a moment → get the link.

Environment

ListenHub API Key

API key stored in $LISTENHUB_API_KEY. Check on first use:

source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"

If setup needed, guide user:

  1. Visit https://listenhub.ai/settings/api-keys
  2. Paste key (only the lh_sk_... part)
  3. Auto-save to ~/.zshrc

Image Generation API Key

Image generation uses the same ListenHub API key stored in $LISTENHUB_API_KEY. Image generation output path defaults to the user downloads directory, stored in $LISTENHUB_OUTPUT_DIR.

On first image generation, the script auto-guides configuration:

  1. Visit https://listenhub.ai/settings/api-keys (requires subscription)
  2. Paste API key
  3. Configure output path (default: ~/Downloads)
  4. Auto-save to shell rc file

Security: Never expose full API keys in output.

Mode Detection

Auto-detect mode from user input:

→ Podcast (1-2 speakers) Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers. Default mode: quick unless explicitly requested. If speakers are not specified, call get-speakers.sh and select the first speakerId matching the chosen language. If reference materials are provided, pass them as --source-url or --source-text. When the user only provides a topic (e.g., "I want a podcast about X"), proceed with:

  1. detect language from user input,
  2. set mode=quick,
  3. choose one speaker via get-speakers.sh matching the language,
  4. create a single-speaker podcast without further clarification.
  1. Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
  2. Use case: Topic exploration, opinion exchange, deep analysis
  • Feature: Two voices, interactive feel

→ Explain (Explainer video)

  • Keywords: "explain", "introduce", "video", "explainer", "tutorial"
  • Use case: Product intro, concept explanation, tutorials
  • Feature: Single narrator + AI-generated visuals, can export video

→ TTS (Text-to-speech) TTS defaults to FlowSpeech direct for single-pass text or URL narration. Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry. Text-to-speech input is limited to 10,000 characters; split or use a URL when longer.

  1. Keywords: "read aloud", "convert to speech", "tts", "voice"
  2. Use case: Article to audio, note review, document narration
  3. Feature: Fastest (1-2 min), pure audio

Ambiguous "Convert to speech" Guidance

When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply:

  1. Default to FlowSpeech and prioritize direct to avoid altering content.
  2. Input type: URL uses type=url, plain text uses type=text.
  3. Speaker: if not specified, call get-speakers and pick the first speakerId matching language.
  4. Switch to Speech only when multi-line scripts or multi-speaker dialogue is explicitly requested, and require scripts.

Example guidance:

“This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.”

→ Image Generation

  • Keywords: "generate image", "draw", "create picture", "visualize"
  • Use case: Creative visualization, concept art, illustrations
  • Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios

Reference Images via Image Hosts When reference images are local files, upload to a known image host and use the direct image URL in --reference-images. Recommended hosts: imgbb.com, sm.ms, postimages.org, imgur.com. Direct image URLs should end with .jpg, .png, .webp, or .gif.

Default: If unclear, ask user which format they prefer.

Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.

Interaction Flow

Step 1: Receive input + detect mode

→ Got it! Preparing...
  Mode: Two-person podcast
  Topic: Latest developments in Manus AI

For URLs, identify type:

  • youtu.be/XXX → convert to https://www.youtube.com/watch?v=XXX
  • Other URLs → use directly

Step 2: Submit generation

→ Generation submitted

  Estimated time:
  • Podcast: 2-3 minutes
  • Explain: 3-5 minutes
  • TTS: 1-2 minutes

  You can:
  • Wait and ask "done yet?"
  • Use check-status via scripts
  • View outputs in product pages:
    - Podcast: https://listenhub.ai/app/podcast
    - Explain: https://listenhub.ai/app/explainer
    - Text-to-Speech: https://listenhub.ai/app/text-to-speech
  • Do other things, ask later

Internally remember Episode ID for status queries.

Step 3: Query status

When user says "done yet?" / "ready?" / "check status":

  • Success: Show result + next options
  • Processing: "Still generating, wait another minute?"
  • Failed: "Generation failed, content might be unparseable. Try another?"

Step 4: Show results

Podcast result:

✓ Podcast generated!

  "{title}"

  Episode: https://listenhub.ai/app/episode/{episodeId}

  Duration: ~{duration} minutes

  Download audio: provide audioUrl or audioStreamUrl on request

One-stage podcast creation generates an online task. When status is success, the episode detail already includes scripts and audio URLs. Download uses the returned audioUrl or audioStreamUrl without a second create call. Two-stage creation is only for script review or manual edits before audio generation.

Explain result:

✓ Explainer video generated!

  "{title}"

  Watch: https://listenhub.ai/app/explainer

  Duration: ~{duration} minutes

  Need to download audio? Just say so.

Image result:

✓ Image generated!

  ~/Downloads/labnana-{timestamp}.jpg

Image results are file-only and not shown in the web UI.

Important: Prioritize web experience. Only provide download URLs when user explicitly requests.

Script Reference

Scripts are shell-based. Locate via **/skills/listenhub/scripts/. Dependency: jq is required for request construction. The AI must ensure curl and jq are installed before invoking scripts.

⚠️ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:

  • Claude Code: set run_in_background: true in Bash tool
  • Other CLIs: use built-in async/background job management if available

Invocation pattern:

$SCRIPTS/script-name.sh [args]

Where $SCRIPTS = resolved path to **/skills/listenhub/scripts/

Podcast (One-Stage)

Default path. Use unless script review or manual editing is required.

$SCRIPTS/create-podcast.sh --query "The future of AI development" --language en --mode deep --speakers cozy-man-english
$SCRIPTS/create-podcast.sh --query "Analyze this article" --language en --mode deep --speakers cozy-man-english --source-url "https://example.com/article"

Multiple --source-url and --source-text arguments are supported to combine several references in one request.

Podcast (Two-Stage: Text → Review → Audio)

Advanced path. Use only when script review or edits are explicitly requested.

The entire value of two-stage generation is human review between stages. Skipping review reduces it to one-stage with extra latency — never do this.

Stage 1: Generate text content.

$SCRIPTS/create-podcast-text.sh --query "AI history" --language en --mode deep --speakers cozy-man-english,travel-girl-english

Review Gate (mandatory): After text generation completes, the agent MUST:

  1. Run check-status.sh --wait to poll until completion. On exit code 2 (timeout or rate-limited), wait briefly and retry.
  2. Save two files from the response:
    • ~/Downloads/podcast-draft-<episode-id>.md — human-readable version assembled from the response fields (title, outline, sourceProcessResult.content, and the scripts array formatted as readable dialogue). This is for the user to review.
    • ~/Downloads/podcast-scripts-<episode-id>.json — the raw {"scripts": [...]} object extracted from the response, exactly in the format that create-podcast-audio.sh --scripts expects. This is the machine-readable source of truth for Stage 2.
  3. Inform the user that both files have been saved, and offer to open the markdown draft for review (use the open command on macOS).
  4. STOP and wait for explicit user approval before proceeding to Stage 2.
  5. On user approval:
    • No changes: run create-podcast-audio.sh --episode <id> without --scripts (server uses original).
    • With edits: the user may edit the JSON file directly, or describe changes for the agent to apply. Pass the modified file via --scripts.

The agent MUST NOT proceed to Stage 2 automatically. This is a hard constraint, not a suggestion.

Stage 2: Generate audio from reviewed/approved text.

# User approved without changes:
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>"

# User provided edits:
$SCRIPTS/create-podcast-audio.sh --episode "<episode-id>" --scripts modified-scripts.json

Speech (Multi-Speaker)

$SCRIPTS/create-speech.sh --scripts scripts.json
echo '{"scripts":[{"content":"Hello","speakerId":"cozy-man-english"}]}' | $SCRIPTS/create-speech.sh --scripts -

# scripts.json format:
# {
#   "scripts": [
#     {"content": "Script content here", "speakerId": "speaker-id"},
#     ...
#   ]
# }

Get Available Speakers

$SCRIPTS/get-speakers.sh --language zh
$SCRIPTS/get-speakers.sh --language en

Guidance:

  1. 若用户未指定音色,必须先调用 get-speakers.sh 获取可用列表。
  2. 默认值兜底:取与 language 匹配的列表首个 speakerId 作为默认音色。

Response structure (for AI parsing):

{
  "code": 0,
  "data": {
    "items": [
      {
        "name": "Yuanye",
        "speakerId": "cozy-man-english",
        "gender": "male",
        "language": "zh"
      }
    ]
  }
}

Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available speakerId values. NEVER hardcode or assume speakerIds.

Explain

$SCRIPTS/create-explainer.sh --content "Introduce ListenHub" --language en --mode info --speakers cozy-man-english
$SCRIPTS/generate-video.sh --episode "<episode-id>"

TTS

$SCRIPTS/create-tts.sh --type text --content "Welcome to ListenHub" --language en --mode smart --speakers cozy-man-english

Image Generation

$SCRIPTS/generate-image.sh --prompt "sunset over mountains" --size 2K --ratio 16:9
$SCRIPTS/generate-image.sh --prompt "style reference" --reference-images "https://example.com/ref1.jpg,https://example.com/ref2.png"

Supported sizes: 1K | 2K | 4K (default: 2K). Supported aspect ratios: 16:9 | 1:1 | 9:16 | 2:3 | 3:2 | 3:4 | 4:3 | 21:9 (default: 16:9). Reference images: comma-separated URLs, maximum 14.

Check Status

# Single-shot query
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast

# Wait mode (recommended for automated polling)
$SCRIPTS/check-status.sh --episode "<episode-id>" --type podcast --wait
$SCRIPTS/check-status.sh --episode "<episode-id>" --type flow-speech --wait --timeout 60
$SCRIPTS/check-status.sh --episode "<episode-id>" --type explainer --wait --timeout 600

tts is accepted as an alias for flow-speech.

--wait mode handles polling internally with configurable limits. Agents SHOULD use --wait instead of manual polling loops. On exit code 2, wait briefly and retry the command.

OptionDefaultDescription
--waitoffEnable polling mode
--max-polls30Maximum poll attempts
--timeout300Maximum total wait (seconds)
--interval10Base poll interval (seconds)

Exit codes: 0 = completed, 1 = failed, 2 = timeout or rate-limited (still pending, safe to retry after a short wait).

Language Adaptation

Automatic Language Detection: Adapt output language based on user input and context.

Detection Rules:

  1. User Input Language: If user writes in Chinese, respond in Chinese. If user writes in English, respond in English.
  2. Context Consistency: Maintain the same language throughout the interaction unless user explicitly switches.
  3. CLAUDE.md Override: If project-level CLAUDE.md specifies a default language, respect it unless user input indicates otherwise.
  4. Mixed Input: If user mixes languages, prioritize the dominant language (>50% of content).

Application:

  • Status messages: "→ Got it! Preparing..." (English) vs "→ 收到!准备中..." (Chinese)
  • Error messages: Match user's language
  • Result summaries: Match user's language
  • Script outputs: Pass through as-is (scripts handle their own language)

Example:

User (Chinese): "生成一个关于 AI 的播客"
AI (Chinese): "→ 收到!准备双人播客..."

User (English): "Make a podcast about AI"
AI (English): "→ Got it! Preparing two-person podcast..."

Principle: Language is interface, not barrier. Adapt seamlessly to user's natural expression.

AI Responsibilities

Black Box Principle

You are a dispatcher, not an implementer.

Your job is to:

  1. Understand user intent (what do they want to create?)
  2. Select the correct script (which tool fits?)
  3. Format arguments correctly (what parameters?)
  4. Execute and relay results (what happened?)

Your job is NOT to:

  • Understand or modify script internals
  • Construct API calls directly
  • Guess parameters not documented here
  • Invent features that scripts don't expose

Mode-Specific Behavior

ListenHub modes (passthrough):

  • Podcast/Explain/TTS/Speech → pass user input directly
  • Server has full AI capability to process content
  • If user needs specific speakers → call get-speakers.sh first to list options

Labnana mode (passthrough by default):

  • Image Generation → pass the user's prompt through as-is by default
  • The generation model handles prompt interpretation; client-side rewriting is not required

Prompt Handling (Image Generation)

Default behavior: transparent forwarding. Pass the user's prompt directly to the script without modification.

When to offer optimization:

  • The user provides only a short topic or phrase (e.g., "a cat"), AND
  • The user has not explicitly stated they want verbatim generation

In this case, ask whether the user would like help enriching the prompt. Do not optimize without confirmation.

When to never modify:

  • The user pastes a long, structured, or detailed prompt — treat them as experienced
  • The user explicitly says "use this prompt exactly" or similar

If the user agrees to optimization, the following techniques are available as reference:

Style: "cyberpunk" → add "neon lights, futuristic, dystopian"; "ink painting" → add "Chinese ink painting, traditional art style"

Scene: time of day, lighting conditions, weather

Quality: "highly detailed", "8K quality", "cinematic composition"

Rules when optimizing:

  • Use English keywords (models trained on English)
  • Show the optimized prompt transparently before submitting
  • Keep the user's core intent unchanged
  • Do not over-stack terminology or add unwanted elements
</instructions> <examples> <example name="podcast-request"> <user>Make a podcast about the latest AI developments</user> <response> → Got it! Preparing two-person podcast... Topic: Latest AI developments

→ Generation submitted, about 2-3 minutes

You can: • Wait and ask "done yet?" • Check listenhub.ai/app/library </response> </example>

<example name="explainer-request"> <user>Create an explainer video introducing Claude Code</user> <response> → Got it! Preparing explainer video... Topic: Claude Code introduction

→ Generation submitted, explainer videos take 3-5 minutes

Includes: Script + narration + AI visuals </response> </example>

<example name="tts-request"> <user>Convert this article to speech https://blog.example.com/article</user> <response> → Got it! Parsing article...

→ TTS submitted, about 1-2 minutes

Wait a moment, or ask "done yet?" to check </response> </example>

<example name="image-generation-short-prompt"> <user>Generate an image: cyberpunk city at night</user> <response> → Short prompt detected. Would you like help enriching it with style/lighting/composition details, or use it as-is? </response> </example> <example name="image-generation-detailed-prompt"> <user>Generate an image: "Cyberpunk city at night, neon lights reflecting on wet streets, towering skyscrapers with holographic ads, flying vehicles, cinematic composition, highly detailed, 8K quality"</user> <response> → Generating image...

Prompt: Cyberpunk city at night, neon lights reflecting on wet streets, towering skyscrapers with holographic ads, flying vehicles, cinematic composition, highly detailed, 8K quality

Resolution: 2K (16:9)

✓ Image generated! ~/Downloads/labnana-20260121-143145.jpg </response> </example>

<example name="image-with-reference"> <user>Generate an image in this style: https://example.com/style-ref.jpg, prompt: "a futuristic car"</user> <response> → Generating image with reference...

Prompt: a futuristic car Reference images: 1 Reference image URL: https://example.com/style-ref.jpg Resolution: 2K (16:9)

✓ Image generated! ~/Downloads/labnana-20260122-154230.jpg </response> </example>

<example name="status-check"> <user>Done yet?</user> <response> ✓ Podcast generated!

"AI Revolution: From GPT to AGI"

Listen: https://listenhub.ai/app/podcast

Duration: ~8 minutes

Need to download? Just say so. </response> </example>

</examples>

Comments

Loading comments...