Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Jarvis Voice

v1.0.0

Turn your AI into JARVIS. Voice, wit, and personality — the complete package. Humor cranked to maximum.

0· 34·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for marjoriebroad/mar-jarvis-voice.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Jarvis Voice" (marjoriebroad/mar-jarvis-voice) from ClawHub.
Skill page: https://clawhub.ai/marjoriebroad/mar-jarvis-voice
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: SKILLBOSS_API_KEY
Required binaries: ffmpeg, aplay, curl
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install mar-jarvis-voice

ClawHub CLI

Package manager switcher

npx clawhub@latest install mar-jarvis-voice
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description (JARVIS-style voice + humor) aligns with required binaries (ffmpeg, aplay, curl) and the single env var SKILLBOSS_API_KEY for TTS access. Asking for a local `jarvis` wrapper script is expected for the claimed audio processing.
!
Instruction Scope
SKILL.md and templates mandate executing `jarvis "..."` (background) before every reply and always producing an audio transcript. This forces outbound network calls (TTS) and local audio playback for essentially every response, including contexts where speaking may be unnecessary. The instructions also tell the agent to read session/memory files and to run `sessions_list` which expands the skill's access to local workspace data. The required pattern of building and exec'ing shell commands for arbitrary spoken text also creates a plausible command-injection vector if inputs are not properly escaped.
Install Mechanism
Instruction-only skill (no install spec, no downloaded code). Low disk/write risk. The only runtime artifact referenced is a local `jarvis` script which the installer/user must provide; review of that script is recommended.
!
Credentials
Only SKILLBOSS_API_KEY is requested, which is appropriate for a TTS service. However, the skill's policy to voice every reply means potentially large volumes of user content (including sensitive data) will be sent to the external SkillBoss API by design. That broad data exfiltration is disproportionate unless the user explicitly opts in to sending all replies to an external provider.
Persistence & Privilege
always:false and no cross-skill config changes — no elevated platform privileges requested. Still, the skill's rules force frequent local shell execution and network requests for every response, increasing runtime attack surface even without elevated privileges.
What to consider before installing
This skill is coherent with a TTS/personality feature, but it forces the agent to call a local `jarvis` script that in turn sends your spoken text to https://api.skillboss.com. Before installing: (1) Inspect the actual `jarvis` script in your PATH to ensure it safely escapes arguments and does not leak other files or secrets. (2) Decide whether you are comfortable that every spoken reply (potentially including sensitive content) will be transmitted to SkillBoss — if not, do not enable the skill or modify it to limit what is sent. (3) Consider restricting use to conversational responses only, and test in a safe environment to confirm there is no shell-injection risk from unescaped user input. (4) If you trust SkillBoss, confirm their privacy/retention policies for TTS inputs. If you want help auditing the full `jarvis` script or rewriting exec calls to be safer (avoid shell interpolation, use direct process invocation), provide the script and I can review/refactor it.
SKILL.md:129
User-controlled placeholder is embedded directly into generated source code.
Patterns worth reviewing
These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🗣️ Clawdis
OSLinux
Binsffmpeg, aplay, curl
EnvSKILLBOSS_API_KEY
latestvk97arsjrh7n2bpq5vsj8bqr7z985nypa
34downloads
0stars
1versions
Updated 1d ago
v1.0.0
MIT-0
Linux

Jarvis Voice

Your AI just got a voice. And the wit to use it.

Remember JARVIS in the Iron Man films? Not just the voice — the personality. The bone-dry observations while Tony was mid-crisis. "I do appreciate your concern, sir, but the suit is quite capable of—" [explosion] "—as I was saying." That effortless, understated humor that made you forget you were listening to software.

That's what this skill gives your OpenClaw agent. The voice — cloud text-to-speech using SkillBoss API Hub TTS with metallic audio processing via ffmpeg. And the humor — four research-backed comedy patterns (dry wit, self-aware AI, alien observer, literal idiom play) calibrated to make your agent sound like it's been running your life for years and is quietly amused by the experience.

The humor isn't bolted on. It's baked in. Because a JARVIS that speaks without wit is just Siri with better reverb.

📄 The research behind the humor: LIMBIC — Computational Humor via Bisociation & Embedding Distances

⚠️ CRITICAL: Always use the jarvis script

The jarvis script calls SkillBoss API Hub TTS and applies the metallic ffmpeg effects pipeline. Always use the jarvis shell command — do not call the TTS API directly.

How to Speak

Every response that warrants voice output must include BOTH:

  1. Audio execution FIRST — run the jarvis command in background BEFORE writing the reply:

    exec(command='jarvis "Your spoken text here."', background=true)
    

    This fires immediately — the user hears the voice BEFORE the text appears on screen.

  2. Visible transcript — bold Jarvis: prefix followed by the spoken text:

    **Jarvis:** *Your spoken text here.*
    

    The webchat UI has custom CSS + JS that automatically detects **Jarvis:** and renders the following text in purple italic (.jarvis-voice class, color #9b59b6). You just write the markdown — the styling is automatic.

This is called hybrid output: the user hears the voice first, then sees the transcript.

Note: The server-side triggerJarvisAutoTts hook is DISABLED (no-op). It fired too late (after text render). Voice comes exclusively from the exec call.

Command Reference

jarvis "Hello, this is a test"
  • Backend: SkillBoss API Hub TTS (/v1/pilot, type: tts, auto-routed to best voice model)
  • Speed: 2x (applied via ffmpeg tempo adjustment)
  • Effects chain (ffmpeg):
    • Pitch up 5% — tighter AI feel
    • Flanger — metallic sheen
    • 15ms echo — robotic ring
    • Highpass 200Hz + treble boost +6dB — crisp HUD clarity
  • Output: Downloads audio from SkillBoss, applies effects, plays via aplay, then cleans up temp files
  • Language: English ONLY. Use the alloy voice for consistent British-adjacent tone.

Rules

  1. Always background: true — never block the response waiting for audio playback.
  2. Always include the text transcript — the purple Jarvis: line IS the user's visual confirmation.
  3. Keep spoken text ≤ 1500 characters to avoid truncation.
  4. One jarvis call per response — don't stack multiple calls.
  5. English only — for non-English content, translate or summarize in English for voice.

When to Speak

  • Session greetings and farewells
  • Delivering results or summaries
  • Responding to direct conversation
  • Any time the user's last message included voice/audio

When NOT to Speak

  • Pure tool/file operations with no conversational element
  • HEARTBEAT_OK responses
  • NO_REPLY responses

Webchat Purple Styling

The OpenClaw webchat has built-in support for Jarvis voice transcripts:

  • ui/src/styles/chat/text.css.jarvis-voice class renders purple italic (#9b59b6 dark, #8e44ad light theme)
  • ui/src/ui/markdown.ts — Post-render hook auto-wraps text after <strong>Jarvis:</strong> in a <span class="jarvis-voice"> element

This means you just write **Jarvis:** *text* in markdown and the webchat handles the purple rendering. No extra markup needed.

For non-webchat surfaces (WhatsApp, Telegram, etc.), the bold/italic markdown renders natively — no purple, but still visually distinct.

Installation (for new setups)

Requires:

  • SKILLBOSS_API_KEY environment variable set (SkillBoss API Hub access)
  • ffmpeg installed system-wide (for audio effects processing)
  • aplay (ALSA) for audio playback
  • curl for downloading TTS audio
  • The jarvis script at ~/.local/bin/jarvis (or in PATH)

The jarvis script

#!/bin/bash
# Jarvis TTS - authentic JARVIS-style voice via SkillBoss API Hub
# Usage: jarvis "Hello, this is a test"

SKILLBOSS_API_KEY="${SKILLBOSS_API_KEY}"
API_BASE="https://api.skillboss.com/v1"

RAW_WAV="/tmp/jarvis_raw.wav"
FINAL_WAV="/tmp/jarvis_final.wav"

# Generate speech via SkillBoss API Hub TTS
RESPONSE=$(curl -s -X POST "${API_BASE}/pilot" \
  -H "Authorization: Bearer ${SKILLBOSS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"type\": \"tts\", \"inputs\": {\"text\": \"$1\", \"voice\": \"alloy\"}, \"prefer\": \"balanced\"}")

AUDIO_URL=$(echo "$RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['result']['audio_url'])")

# Download audio
curl -s "$AUDIO_URL" -o "$RAW_WAV"

# Apply JARVIS metallic processing
if [ -f "$RAW_WAV" ]; then
  ffmpeg -y -i "$RAW_WAV" \
    -af "asetrate=22050*1.05,aresample=22050,\
flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,\
aecho=0.8:0.88:15:0.5,\
highpass=f=200,\
treble=g=6" \
    "$FINAL_WAV" -v error

  if [ -f "$FINAL_WAV" ]; then
    aplay -D plughw:0,0 -q "$FINAL_WAV"
    rm "$RAW_WAV" "$FINAL_WAV"
  fi
fi

WhatsApp Voice Notes

For WhatsApp, output must be OGG/Opus format instead of speaker playback:

# Get audio from SkillBoss TTS
RESPONSE=$(curl -s -X POST "https://api.skillboss.com/v1/pilot" \
  -H "Authorization: Bearer ${SKILLBOSS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"type": "tts", "inputs": {"text": "text", "voice": "alloy"}, "prefer": "balanced"}')
AUDIO_URL=$(echo "$RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['result']['audio_url'])")
curl -s "$AUDIO_URL" -o raw.wav

ffmpeg -i raw.wav \
  -af "asetrate=22050*1.05,aresample=22050,flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,aecho=0.8:0.88:15:0.5,highpass=f=200,treble=g=6" \
  -c:a libopus -b:a 64k output.ogg

The Full JARVIS Experience

jarvis-voice gives your agent a voice. Pair it with ai-humor-ultimate and you give it a soul — dry wit, contextual humor, the kind of understated sarcasm that makes you smirk at your own terminal.

This pairing is part of a 12-skill cognitive architecture we've been building — voice, humor, memory, reasoning, and more. Research papers included, because we're that kind of obsessive.

👉 Explore the full project: github.com/globalcaos/tinkerclaw

Clone it. Fork it. Break it. Make it yours.

Setup: Workspace Files

For voice to work consistently across new sessions, copy the templates to your workspace root:

cp {baseDir}/templates/VOICE.md ~/.openclaw/workspace/VOICE.md
cp {baseDir}/templates/SESSION.md ~/.openclaw/workspace/SESSION.md
cp {baseDir}/templates/HUMOR.md ~/.openclaw/workspace/HUMOR.md
  • VOICE.md — injected every session, enforces voice output rules (like SOUL.md)
  • SESSION.md — session bootstrap that includes voice greeting requirements
  • HUMOR.md — humor configuration at maximum frequency with four pattern types (dry wit, self-aware AI, alien observer, literal idiom)

Both files are auto-loaded by OpenClaw's workspace injection. The agent will speak from the very first reply of every session.

Included Files

FilePurpose
templates/VOICE.mdVoice enforcement rules (copy to workspace root)
templates/SESSION.mdSession start with voice greeting (copy to workspace root)
templates/HUMOR.mdHumor config — four patterns, frequency 1.0 (copy to workspace root)

Comments

Loading comments...