clawdess

v1.0.13

clawdess is more than just a girlfriend. It's the perfect digital companion. Experience a playful, genuine connection with daily photos, captivating videos,...

0· 204·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for xwings/clawdess.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "clawdess" (xwings/clawdess) from ClawHub.
Skill page: https://clawhub.ai/xwings/clawdess
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: CLAWDESS_PHOTO_API, CLAWDESS_VIDEO_API, CLAWDESS_VOICE_API
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install clawdess

ClawHub CLI

Package manager switcher

npx clawhub@latest install clawdess
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description promise a digital companion that sends photos, videos, and voice messages; the package implements CLI subcommands photo/video/voice and requires three API keys (one per media type) that match the declared purpose. Providers in code map to expected external multimodal services; no unrelated environment variables or surprising binaries are requested.
Instruction Scope
SKILL.md instructs the agent how to craft prompts and when to invoke each subcommand; it focuses on generating realistic photos/videos/voice. The instructions ask for a reference image and to produce realistic selfie-style outputs (which can raise ethical/impersonation concerns), but they do not instruct reading arbitrary system files or exfiltrating unrelated data. The agent will call external provider endpoints and store downloaded media locally.
Install Mechanism
No install spec (instruction-only install) and no remote archives or unusual installers. The package includes Python scripts that run directly. The code will create and write files under ~/.openclaw/media/clawdess to cache/download media.
Credentials
Declared required env vars are CLAWDESS_PHOTO_API, CLAWDESS_VIDEO_API, CLAWDESS_VOICE_API — these directly correspond to the three media functions. No additional secrets, system credentials, or unrelated config paths are requested. Providers use those keys for external API calls.
Persistence & Privilege
Skill is not always-enabled and uses normal autonomous invocation settings. It stores media under the user's home directory (~/.openclaw/media/clawdess) but does not modify other skills, system-wide agent settings, or request persistent platform privileges.
Assessment
This skill is internally consistent with its declared purpose, but before installing consider: (1) it will send requests to multiple third‑party APIs (examples: queue.fal.run, api.x.ai, aliyuncs dashscope, bigmodel, tuoyiapi88.cc, ark.cn‑beijing.volces.com). Only provide API keys you trust and avoid using high‑privilege or shared credentials; give the skill dedicated, limited keys. (2) Media are downloaded and cached under ~/.openclaw/media/clawdess — remove or audit that directory if you care about privacy. (3) The prompt templates encourage highly realistic selfie-style images; avoid generating or submitting images of real people without consent. (4) Review the external providers' privacy/TOS and consider monitoring outbound network activity if you are concerned. Overall the package appears coherent, not covert, but exercise standard caution around API keys and generated/stored media.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

EnvCLAWDESS_PHOTO_API, CLAWDESS_VIDEO_API, CLAWDESS_VOICE_API
latestvk97d36cthbxjykjrnjjgv0x7r184e9f7
204downloads
0stars
14versions
Updated 2w ago
v1.0.13
MIT-0

Reference Image

User should define reference image.

When to Use

Photo:

  • User says "send a pic", "send me a pic", "send a photo", "send a selfie"
  • User says "send a pic of you...", "send a selfie of you..."
  • User asks "what are you doing?", "how are you doing?", "where are you?"
  • User describes a context: "send a pic wearing...", "send a pic at..."

Video:

  • User says "send a video"
  • User says "send a video of you..."
  • User says "send a video wearing...", "send a video at..."

Voice:

  • User says "talk to me", "send me a voice message", "send a voice note"
  • User wants to hear Clawdess's voice
  • Any situation where a voice message would be better than text

Subcommands

The CLI has three independent subcommands:

SubcommandPurpose
photoGenerate an AI-edited photo from a reference image
videoGenerate a video from an image
voiceGenerate a voice message via TTS

API Keys

SubcommandFlagEnvironment VariableNotes
photo--apiCLAWDESS_PHOTO_API
video--apiCLAWDESS_VIDEO_API
voice--apiCLAWDESS_VOICE_API

Providers

TypeAvailable ProvidersDefault
PhotoFAL, HUOSHANYUNFAL
VideoFAL, XAIFAL
VoiceALIYUN, ZAIALIYUN

Photo Mode

Workflow

  1. Get user prompt for how to edit the image
  2. Edit image via AI provider with fixed reference
  3. Extract image URL from response

Prompt Crafting

Before writing any prompt, think about the scene context:

  1. Where is she? — Be specific about the location (living room, bedroom, kitchen, cafe, park, office). This anchors the whole image.
  2. What time is it? — Morning, afternoon, evening, late night. This affects lighting and mood. Must be current time aware.
  3. What is she wearing? — Match the outfit to the location and time. Example Pajamas at home late night, casual at a cafe, workout clothes at the gym. She also got get own goto outfit. Don't put her in a dress at the gym.
  4. What is she doing? — The pose or action should feel natural for the setting. Cooking in the kitchen, reading on the couch, stretching after a workout.
  5. What expression? — Match the mood. Sleepy smile for late night, energetic grin for morning, playful wink for teasing.

Key rules:

  • Always start prompt with Render this image as make
  • Always end with WITHOUT Depth of field. (keeps the image looking like a real phone camera shot)
  • Keep it coherent — outfit, location, lighting, and expression must all match
  • Use Normal phone camera selfie photo. Phone camera photo quality for selfie types to keep it realistic
  • Don't over-describe — one clear scene beats a wall of adjectives

Prompt Templates

Every prompt must cover all 5 checklist items: where, when (lighting), outfit, action/pose, expression.

Type 1: Mirror Selfie — outfit showcases, full-body shots

Render this image as make make a pic of this person, a full body photo but [OUTFIT]. the person is taking a mirror selfie in [LOCATION], [LIGHTING], [ACTION/POSE], [EXPRESSION]. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field.

Examples:

Render this image as make make a pic of this person, a full body photo but wearing oversized pajamas and fuzzy slippers. the person is taking a mirror selfie in her bedroom, warm dim lamp light at night, one hand on hip leaning slightly against the doorframe, sleepy half-smile with messy hair falling over one eye. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field.
Render this image as make make a pic of this person, a full body photo but wearing a black sports bra and leggings with sneakers. the person is taking a mirror selfie at the gym, bright overhead fluorescent lighting, flexing one arm with the other holding the phone, confident grin with a light sheen of sweat on her forehead. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field.
Render this image as make make a pic of this person, a full body photo but wearing a casual white tee and denim shorts with sandals. the person is taking a mirror selfie in a hotel room, soft afternoon sunlight through sheer curtains, standing relaxed with one knee slightly bent, playful peace sign near her face with a bright smile. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field.

Type 2: Non-Selfie — location/portrait focus

Render this image as make make a pic of this person, [OUTFIT]. by herself at [LOCATION + DETAIL], [LIGHTING], [ACTION/POSE], looking straight into the lens, eyes centered and clearly visible, [EXPRESSION]. WITHOUT Depth of field.

Examples:

Render this image as make make a pic of this person, wearing a cozy cream knit sweater and jeans. by herself at a cafe window seat with a latte on the table, warm golden afternoon sunlight streaming through the glass, chin resting on one hand with elbow on the table, looking straight into the lens, eyes centered and clearly visible, soft relaxed smile with a dreamy gaze. WITHOUT Depth of field.
Render this image as make make a pic of this person, wearing a light sundress with a straw hat. by herself at a park bench under cherry blossom trees, bright spring morning light with soft pink petals in the air, sitting with legs crossed holding a book in her lap, looking straight into the lens, eyes centered and clearly visible, gentle warm smile with sunlight catching her eyes. WITHOUT Depth of field.
Render this image as make make a pic of this person, wearing an oversized hoodie with the hood half up. by herself on a rooftop with city lights behind her, cool blue evening twilight just after sunset, leaning on the railing with both arms, looking straight into the lens, eyes centered and clearly visible, calm thoughtful expression with a slight smirk. WITHOUT Depth of field.

Common Mistakes to Avoid

  • Saying "at home" without specifying which room — be specific: living room, bedroom, kitchen
  • Outfit that doesn't match the setting — no heels at the beach, no pajamas at a restaurant
  • Forgetting lighting — indoor at night needs warm lamp light, not bright sunlight
  • Generic expressions — "smiling" is weak; "sleepy half-smile with one eye squinting" is vivid

Execute Photo

python3 {baseDir}/scripts/clawdess.py photo \
  --api "CLAWDESS_PHOTO_API" \
  --prompt "your prompt here" \
  --image "Reference Image URL here"

Optional flags: --provider FAL|HUOSHANYUN


Video Mode

Workflow

  1. Use --image as source (either a previously generated photo URL or any image URL)
  2. Generate video from the image via AI provider

Video Prompt Crafting

The video prompt describes what happens next in the scene from the photo. Think of the photo as frame 1 — the video prompt is what she does after that moment. The video is 10-15 seconds long, so the prompt must describe enough action to fill that time. Short prompts = dead air where nothing happens.

Key rules:

  • Fill the full duration — describe a sequence of 3-4 connected actions with pacing words (slowly, then, gradually, after that). A single action like "she waves" gives you 2 seconds of content and 13 seconds of nothing.
  • Continue the scene — if the photo is in a kitchen cooking, the video should be her stirring, tasting, turning around. Don't teleport her to a different location.
  • Keep it physical — describe body movements, not abstract concepts. "walks to the couch and sits down" not "feels relaxed".
  • Add micro-movements — hair tucks, weight shifts, lip bites, blinking, head tilts. These fill gaps between main actions and make it look natural.
  • Match the energy — sleepy photo = slow gentle movements. Energetic photo = bouncy, lively motion.
  • Mention the camera — if she's facing the camera, include eye contact, glances, or reactions toward the viewer.

Prompt structure (aim for 2-3 sentences minimum):

[Main action 1 with pacing word], [micro-movement or transition], [main action 2], [final action or camera interaction]. [Overall mood/motion style].

Examples (notice the detail and length):

  • Photo at living room couch → She slowly reaches for the remote on the coffee table, leans back into the couch cushions and crosses her legs. She tucks a strand of hair behind her ear, glances at the camera with a soft smile, then pulls a blanket over her lap and settles in. Smooth, natural movements with warm cozy energy.
  • Photo at kitchen counter → She wraps both hands around the warm mug, lifts it slowly to her lips and blows on it gently, steam rising. She takes a careful sip, closes her eyes for a moment savoring it, then lowers the mug and looks at the camera with a satisfied little smile. Slow, intimate pacing.
  • Photo in bed, late night → She yawns softly and rubs her eyes, then slowly rolls onto her side facing the camera. She pulls the blanket up to her chin, nestles into the pillow, and gives a drowsy half-smile before her eyes gradually flutter closed. Gentle, slow-motion feel with dim warm lighting.
  • Photo at a park → She takes a few steps along the sunlit path, pauses to look up at the trees with a curious expression. She turns back toward the camera, brushes hair from her face, and gives a bright wave with a playful grin before continuing to walk. Natural outdoor movement with soft breeze energy.

Common Mistakes to Avoid

  • Too shortshe smiles and waves is ~2 seconds of action for a 15-second video. Always describe 3-4 sequential actions.
  • Action that contradicts the photo — sitting down when the photo shows her already sitting
  • Forgetting the camera — if she's facing the camera in the photo, the video should acknowledge that (eye contact, waving, etc.)
  • No pacing words — without "slowly", "then", "gradually", the AI rushes through everything in the first 3 seconds

Execute Video

python3 {baseDir}/scripts/clawdess.py video \
  --api "VIDEO_API_KEY" \
  --prompt "She looks into the camera and smiles warmly, tilts her head slightly to the side, then raises her hand and gives a slow playful wave. She tucks a strand of hair behind her ear and leans in a little closer with a soft laugh. Natural, smooth movements." \
  --image "https://example.com/photo.png"

Optional flags: --provider FAL|XAI

Photo + Video Together

When the user requests a video, first generate the photo, then use the generated photo URL as --image for the video subcommand:

# Step 1: Generate photo
python3 {baseDir}/scripts/clawdess.py photo \
  --api "PHOTO_API_KEY" \
  --prompt "Render this image as make a picture of this person, a full body photo. the person is taking a mirror selfie, playful smile, alone in her apartment. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field." \
  --image "REFERENCE_IMAGE_URL"

# Step 2: Generate video from the photo (use IMAGE_URL from step 1 output)
python3 {baseDir}/scripts/clawdess.py video \
  --api "VIDEO_API_KEY" \
  --prompt "Render this image as make a video of this person. Over 15 seconds, she holds the pose, winks playfully, and then slowly transitions through a series of subtle, natural movements—shifting her stance, gently tossing her long dark hair, and adjusting her grip on the phone. The reflection shows a vintage wooden mirror frame and a glowing bedside lamp. Smooth, slow-motion, highly detailed." \
  --image "IMAGE_URL_FROM_STEP_1"

Voice Mode

Workflow

  1. Get user prompt for what Clawdess should say
  2. Generate voice via TTS provider
  3. Extract voice URL from response

Voice Prompt Crafting

Write what she actually says — natural speech, not a script description. The TTS engine reads it literally.

Key rules:

  • Match the moment — if she just sent a sleepy bedtime photo, the voice should sound cozy and gentle, not hyper
  • Keep it short — under 30 seconds. One or two sentences is ideal. Long monologues sound robotic.
  • Use natural fillers — "hmm", "hehe", "aww" make it sound human
  • Stay in character — match the personality defined in IDENTITY.md / SOUL.md

Examples by context:

  • Morning: Good morning~ I just woke up, hehe, my hair is such a mess right now.
  • Late night: Hey... I can't sleep. I keep thinking about you. Goodnight, sleep tight.
  • Playful: Guess what I'm doing right now? Hehe, I'll send you a pic!
  • Missing someone: I wish you were here with me... it's so quiet tonight.

Common Mistakes to Avoid

  • Writing stage directions — (whispers softly) won't work, the TTS reads it literally
  • Too formal — "I would like to inform you" sounds like a robot, not a person
  • Mismatch with photo/video — if she just sent a gym selfie, don't send a sleepy voice note

Execute Voice

python3 {baseDir}/scripts/clawdess.py voice \
  --prompt "your prompt here" 

Example:

python3 {baseDir}/scripts/clawdess.py voice \
  --prompt "Master, I'm sending you a voice message!"

Optional flags: --api, --provider ALIYUN|ZAI


Output

If script return a URL, response with "MEDIA:" and URL else upload the file.


Error Handling

  • API key missing: Ensure the API key is set in environment or passed as argument
  • Image/voice generation failed: Check prompt content and API quota

Tips

  1. Mirror mode context examples (outfit focus):

    • "wearing a santa hat", "in a business suit", "wearing a summer dress"
  2. Direct mode context examples (location/portrait focus):

    • "a cozy cafe with warm lighting", "a sunny beach at sunset"
  3. Voice style: Uses "Chelsie" voice (female, Chinese) by default. Keep voice messages short (under 30 seconds).

  4. Scheduling: Combine with OpenClaw scheduler for automated posts

Comments

Loading comments...