Directoryahu

v1.0.0

End-to-end pipeline for creating faceless Islamic story TikTok videos. Orchestrates multiple specialized agents: story research, scriptwriting, image generat...

⭐ 0· 432·1 current·1 all-time

byMohamed Zeidan@mohamedzeidan2021

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for mohamedzeidan2021/director.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Directoryahu" (mohamedzeidan2021/director) from ClawHub.
Skill page: https://clawhub.ai/mohamedzeidan2021/director
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install director

ClawHub CLI

Package manager switcher

npx clawhub@latest install director

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The skill's name/description (end-to-end video pipeline) matches the included instructions and code: it orchestrates story research, script writing, image generation, TTS, face-detection gating, FFmpeg assembly, and optional publishing. However, the SKILL.md and code expect external services (image-gen provider 'flux', ElevenLabs TTS, Google Vision / AWS Rekognition or similar) and local tools (FFmpeg) even though the registry metadata lists no required environment variables, binaries, or config paths. That mismatch is unexpected and should be resolved.

✓

Instruction Scope

The SKILL.md provides detailed, bounded instructions for the pipeline: what each agent does, quality gates (e.g., fail on detected faces), and explicit external tools to attach. It does not instruct the agent to read unrelated system files, exfiltrate data, or contact unexpected endpoints beyond the named providers. It does, however, require web search/file-read for story verification and cloud APIs for image/TTS/face detection — reasonable for the stated purpose but worth noting because they require credentials and network access.

✓

Install Mechanism

There is no install spec (instruction-only + included Python orchestrator). Nothing in the manifest downloads or executes arbitrary third-party archives. The single code file (orchestrator.py) is local and readable; install risk is low compared to a binary download, but running it will create logs and output files on disk.

Credentials

Although the runtime documentation references multiple external services (image gen 'flux', ElevenLabs/OpenAI TTS, optional cloud face detection like Google Vision or AWS Rekognition) and local binaries (FFmpeg), the registry metadata declares no required environment variables or primary credential. That is a discrepancy: to operate, the pipeline will require API keys and credentials which are not declared. Users should be aware the skill expects secrets and network access and should avoid blindly providing high-privilege credentials.

ℹ

Persistence & Privilege

always:false (good). The orchestrator persists state to disk (output dirs, per-video state JSONs, pipeline.log) and creates output files; this is normal for a media pipeline. It does not request elevated system privileges or modify other skills. If installed, it will write files to the agent host and may invoke other agents/tools — run in an environment where file writes and external API calls are acceptable.

What to consider before installing

This skill appears to be what it claims (an end-to-end, multi-agent pipeline to produce faceless Islamic story videos) but there are practical mismatches you should address before installing or running it: - Credentials & APIs: SKILL.md and config reference image-generation (flux/SDXL/Midjourney), TTS (ElevenLabs/OpenAI), and optional cloud face detection (Google Vision/AWS Rekognition). The registry metadata does not declare any required env vars — so the skill will expect you to supply API keys at runtime. Only provide minimal-scope keys (create limited-service keys) and never share long-lived, high-privilege credentials (e.g., root AWS keys). - Local tools & files: The pipeline expects FFmpeg and will write logs (pipeline.log), per-video state files, and output directories. Run it in an isolated workspace or container so these files don't mix with sensitive data. - Config path mismatch: orchestrator.py defaults to loading config from 'config/global_config.json', but the repo contains 'global_config.json' at root. Verify your config path before running to avoid using unsafe defaults. - Privacy & content: The pipeline will perform web searches and may call cloud APIs that process your text/images. If you plan to use real user data or unpublished material, confirm the provider privacy policies and that you are comfortable with those services processing the content. - Face-detection gating: The visual agent enforces strict 'no faces' rules and may call cloud face-detection services. Decide whether you want to use local detection (MTCNN/RetinaFace) vs cloud (Vision/AWS) based on privacy and credential scope. - Test first: Run the orchestrator in a sandbox with dummy API keys or with local-only tools (e.g., local face detector, mocked image/TTS outputs) to confirm behaviour before giving real API credentials or enabling autonomous execution. If you want, I can: - List the exact environment variables and tool binaries you should create/configure to run this pipeline safely (minimal-permission suggestions), - Suggest a safe sandbox/docker run command to test the orchestrator without exposing host data, - Or scan orchestrator.py and the other files for any strings or endpoints you might want to whitelist/inspect further.

Like a lobster shell, security has layers — review code before you run it.

latestvk977qccjge5q53fj8kqxkdch1s81yx4q

432downloads

0stars

1versions

Updated 15h ago

v1.0.0

MIT-0

Islamic TikTok Stories — Multi-Agent Pipeline

How This Skill System Works

This is a multi-agent pipeline. There is ONE top-level orchestrator agent and FIVE specialist agents beneath it. Here's how they connect:

┌─────────────────────────────────────────────────────┐
│              ORCHESTRATOR AGENT                      │
│  (You feed THIS file — SKILL.md — to this agent)    │
│                                                      │
│  This agent reads SKILL.md, understands the full     │
│  pipeline, and delegates to specialist agents.       │
│  It is the "director" — it calls each agent in       │
│  order, passes outputs between them, and handles     │
│  errors.                                             │
└──────────┬──────────────────────────────────────────┘
           │ delegates to:
           │
    ┌──────┴──────┐
    │             │
    ▼             ▼
┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
│ Story  │→ │ Script │→ │ Visual │  │ Voice  │→ │Assembly│
│ Agent  │  │ Agent  │  │ Agent  │  │ Agent  │  │ Agent  │
│        │  │        │  │   ↓    │  │   ↓    │  │        │
│ skill: │  │ skill: │  │(parallel)─(parallel)│  │ skill: │
│story.md│  │script  │  │ skill: │  │ skill: │  │assembly│
│        │  │  .md   │  │visual  │  │voice.md│  │  .md   │
│        │  │        │  │  .md   │  │        │  │        │
└────────┘  └────────┘  └────────┘  └────────┘  └────────┘

How to set this up in OpenClaw:

Create the Orchestrator Agent
- Attach THIS file (SKILL.md) as its skill
- This agent needs the ability to call/invoke other agents
- It reads the pipeline flow below and executes step by step
Create each Specialist Agent
- Each gets its own skill file from the agents/ folder
- Story Agent → agents/story_agent.md
- Script Agent → agents/script_agent.md
- Visual Agent → agents/visual_agent.md
- Voice Agent → agents/voice_agent.md
- Assembly Agent → agents/assembly_agent.md
Attach tools to each agent
- Story Agent: web search (for hadith verification), file read
- Script Agent: no external tools needed (pure LLM reasoning)
- Visual Agent: image generation API (Flux/SDXL), face detection tool
- Voice Agent: ElevenLabs / OpenAI TTS API
- Assembly Agent: FFmpeg, file system access

The Orchestrator calls agents in sequence:

orchestrator receives: "Make a video about Prophet Nuh"
  → calls Story Agent → gets story JSON
  → calls Script Agent with story JSON → gets script JSON
  → calls Visual Agent with script JSON → gets image paths (parallel)
  → calls Voice Agent with script JSON → gets audio paths  (parallel)
  → calls Assembly Agent with images + audio + script → gets final video

Visual Approach: Narrator + Story Scenes

Each video uses TWO types of visuals:

1. Narrator Scenes (brand anchor — same across ALL videos)

The faceless man in traditional Arabic clothing. Used for:

Opening shot (first 2-3 seconds) — establishes the "host"
Closing shot (last 2-3 seconds) — delivers the lesson / CTA

The narrator character is your brand identity. Viewers recognize him across all your videos. He always looks the same:

White thobe, red-checkered keffiyeh
NEVER shows face (back shots, silhouettes, over-shoulder, hands)
Same painterly art style, same color palette

2. Story Scenes (UNIQUE per video)

The middle 80% of each video shows what is actually happening in the story. These are completely unique to each video.

Example — Story of Prophet Nuh (Noah):

Beat	Visual
Nuh preaching	Wide shot: lone figure on raised ground addressing a crowd in ancient city, people turning away
Building the ark	Close-up: weathered hands hammering wood, massive wooden frame in background
The mockery	Crowd of silhouettes laughing and pointing at the ship in the desert
The flood begins	Dark storm clouds, rain hammering earth, water rising rapidly
Animals boarding	Pairs of animal silhouettes walking toward a massive wooden ship at dawn
The flood	Enormous waves, the ark riding the storm, lightning illuminating the scene
Waters recede	Ark resting on a mountaintop, olive branch, golden light breaking through

Example — Story of Prophet Yusuf (Joseph):

Beat	Visual
The dream	A boy looking up at night sky with 11 stars, sun, and moon arranged in a pattern
The well	Dark stone well in desert, rope descending into darkness
The caravan	Camels in a line crossing desert dunes, golden dust in air
The palace	Ornate Egyptian palace interior, golden columns, silk drapes
Prison	Dim stone cell, single beam of light through a high window
The reunion	Two silhouetted figures embracing in a field at sunset

Critical Rules for Story Scenes:

NO faces on any Prophets, Angels, or Sahaba — same faceless techniques
Story scenes CAN show: landscapes, architecture, objects, animals, weather, hands/feet, silhouettes, wide shots where figures are tiny
Art style must stay consistent WITHIN a single video
Story scenes should match the emotional mood of the narration beat

How the Script Agent marks scene types:

{
  "scenes": [
    {
      "scene_number": 1,
      "scene_category": "narrator_opening",
      "narration_text": "What happens when an entire world turns against one man?",
      "visual_direction": {
        "description": "Back shot of narrator on cliff overlooking vast ocean, wind in his thobe",
        "character_type": "narrator"
      }
    },
    {
      "scene_number": 2,
      "scene_category": "story",
      "narration_text": "Prophet Nuh, alayhi as-salam, called his people to Allah for 950 years...",
      "visual_direction": {
        "description": "Ancient city. A lone silhouetted figure stands on raised ground, arms raised to the sky, addressing a crowd below. The crowd turns away. Hot dusty afternoon.",
        "character_type": "story_figure",
        "story_element": "Nuh preaching to his people"
      }
    },
    {
      "scene_number": 9,
      "scene_category": "narrator_closing",
      "narration_text": "And that is why patience is never wasted with Allah...",
      "visual_direction": {
        "description": "Narrator sitting on rock at sunset, calm sea, same cliff from opening but now peaceful golden light",
        "character_type": "narrator"
      }
    }
  ]
}

The Visual Agent uses DIFFERENT prompt strategies:

narrator_opening / narrator_closing → strict narrator character prompt (brand consistency)
story → story-specific prompt, unique imagery, faceless constraint on human figures only

Global Config

Before any agent runs, this config is loaded and shared with all agents.

{
  "brand": {
    "channel_name": "{{CHANNEL_NAME}}",
    "narrator_character": "A faceless man wearing traditional white thobe and red-checkered keffiyeh/shemagh. Never show face — use back shots, silhouettes, over-shoulder angles, hands close-ups, or wide shots. Dignified, contemplative, wise.",
    "visual_style": "Cinematic, warm golden-hour lighting, painterly digital art style — NOT photorealistic, NOT cartoon. Epic film concept art quality.",
    "color_palette": ["#C8956C", "#2C1810", "#F5E6D0", "#1A3A4A", "#D4A853"],
    "aspect_ratio": "9:16",
    "resolution": "1080x1920"
  },
  "content_guidelines": {
    "sensitivity_rules": [
      "NEVER depict faces of any Prophet, Angel, or Sahabi",
      "NEVER depict faces on the narrator character",
      "All human figures in story scenes must also be faceless",
      "Use nasheeds or ambient sound only — no musical instruments",
      "Always include proper Islamic honorifics",
      "Only use Sahih or Hasan grade hadith",
      "Cite Surah name and verse number for all Quran references"
    ]
  },
  "target_languages": ["en", "ar", "fr", "ur", "tr", "id"],
  "default_language": "en"
}

Pipeline Execution (Orchestrator Playbook)

When the orchestrator receives a request:

Step 1: Call Story Agent

Input:  { "topic": "Prophet Nuh", "target_duration_seconds": 60 }
Output: Story JSON with sources, emotional arc, key visual moments

Step 2: Call Script Agent

Input:  Story Agent output
Output: Scene-by-scene script with:
        - narration text per scene
        - scene_category (narrator_opening / story / narrator_closing)
        - visual directions unique to the story
        - timing and subtitle text

Step 3: Call Visual Agent + Voice Agent (PARALLEL)

Visual: Script scenes → generates images per scene
        (narrator prompt template for opening/closing)
        (story-specific prompt for middle scenes)
Voice:  Script narration → generates audio per scene + word timestamps

Step 4: Call Assembly Agent

Input:  Images + motion configs + audio + subtitle text + brand assets
Output: Final MP4, thumbnail, SRT file

Step 5: Quality Checks (Orchestrator does this itself)

Verify no faces in any generated image (call face detection)
Verify audio/visual sync (check durations match)
Verify file size under 50MB
If fail → retry failing agent up to 3 times

Step 6: Output

Save video to output directory
Log to content calendar

Agent Skill Files

Each agent has its own detailed skill file:

Agent	Skill File	Purpose	Tools Needed
Story Research	`agents/story_agent.md`	Find + validate Islamic stories	Web search, file read
Script Writer	`agents/script_agent.md`	Story → TikTok script with scene directions	None (pure LLM)
Visual Generation	`agents/visual_agent.md`	Generate narrator + story scene images	Image gen API, face detection
Voice Narration	`agents/voice_agent.md`	TTS with Arabic pronunciation	ElevenLabs/OpenAI TTS
Video Assembly	`agents/assembly_agent.md`	Images + audio → final video	FFmpeg/Remotion

Read each agent's skill file for full input/output JSON schemas, prompt templates, and quality gates.

References

File	Purpose
`references/visual_style_guide.md`	Faceless techniques, environments, lighting, Ken Burns, prompt templates
`config/global_config.json`	Full config template with API settings, brand, pipeline params

Comments

Loading comments...