Content Engine

v0.3.1

Content Engine (Xiaohongshu). Two modes: ① Deconstruct (v1) — input a viral XHS link, get an 18-field structured card. ② Generate (v2) — combine the deconstr...

⭐ 0· 73·0 current·0 all-time

by@dizhu

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dizhu/qianxun-content-engine-en.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Content Engine" (dizhu/qianxun-content-engine-en) from ClawHub.
Skill page: https://clawhub.ai/dizhu/qianxun-content-engine-en
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install qianxun-content-engine-en

ClawHub CLI

Package manager switcher

npx clawhub@latest install qianxun-content-engine-en

Security Scan

Capability signals

CryptoCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The name/description promise (deconstruct XHS links; generate text/images/videos) is coherent with included code, but the registry metadata claims no required env vars or binaries while the package includes clients for external services: TikhubClient (TikHub/TikHub-like API), an Ofox LLM client, Nano Banana image generator, and a Seedance / Volcengine video path. The SKILL.md also states ffmpeg is required. Generating real images/videos or calling LLM/image APIs normally requires API keys and network access — those credentials are not declared. This is an unexplained mismatch between purpose and declared capabilities.

Instruction Scope

SKILL.md instructs the agent to run Python scripts (extract_xhs.py, generate_xhs.py, and package modules) and to read and write the skill's graph/ .md files (auto-writeback). It explicitly uses the agent's Read/Write/Exec tools and ffmpeg. The instructions imply network calls to platform APIs and third-party LLM/image services and allow the system to append/modify graph nodes automatically. Those behaviors go beyond mere local parsing: they will fetch external content and persist derived data. The SKILL.md is prescriptive about scanning/writing graph nodes (append-only) which is expected, but the document does not enumerate what external endpoints are used or what data is sent.

ℹ

Install Mechanism

There is no install spec (instruction-only), which reduces supply-chain risk from remote downloads. However, this package includes 28+ scripts and a Python package that will be executed in the agent environment. That increases attack surface compared to a pure-text SKILL.md. No external install URL is used, which is good, but you still must treat bundled code as executable payload.

Credentials

The skill declares no required environment variables or primary credential, yet the codebase contains clients for external services (client.py / llm.py / nano_banana.py / seedance.py / generate.py). Those modules very likely require API keys, endpoints, or tokens at runtime — their omission from requires.env under-declares required secrets. Also SKILL.md mentions ffmpeg is required but the registry lists no required binaries. The skill will read/write graph/ files (expected) and will assemble and potentially transmit content to third-party LLM/image/video services — supplying broad credentials (e.g., cloud keys) without knowing the exact scope would be risky.

ℹ

Persistence & Privilege

always:false (normal). The skill auto-writes to its own graph/ node files (described behavior). That is expected functionality for a knowledge-graph-backed generator and not an escalation by itself. However, autonomous invocation plus networked LLM/image clients increases the potential blast radius (the agent could autonomously recall a card, call external services, and update graph/). The skill does not claim to modify other skills or system-wide configs.

What to consider before installing

Key things to check before installing or running this skill: 1) Clarify required credentials and endpoints. Ask the publisher which environment variables / API keys are required (TikHub/TikTok API, Ofox LLM keys, Nano Banana image keys, Volcengine/Seedance keys, etc.). The code references LLM/image/video clients but the skill metadata lists none — treat that as an under-declaration. Only provide keys you intend the skill to use, not broad cloud creds. 2) Inspect the code files that touch the network and filesystem first: scripts/content_engine/client.py, llm.py, nano_banana.py, seedance.py, generate.py, extract_xhs.py, and preflight.py. Search for strings like 'http', 'https', 'api', 'host', 'token', 'key', 'requests', 'urllib', 'socket'. Confirm which external hosts receive data and what data is transmitted (raw scraped notes, user-provided brand info, or full deconstruction outputs). 3) Run locally in a sandboxed environment. Execute preflight.py or run the scripts with a dry-run / test mode and with network egress blocked initially to see local behavior. Provide only minimal API keys in a restricted test account if you plan to enable network calls. Do not place unrelated secrets (AWS keys, GitHub tokens) into the agent environment. 4) Pay attention to data retention and privacy. The skill auto-writes deconstruction output and updates graph/ files that may contain scraped competitor content and comment text. If that is sensitive, store or encrypt graph/ files appropriately and review whether you want those artifacts written to persistent storage. 5) Confirm ffmpeg availability. SKILL.md says ffmpeg is required for video composition; install/allow only the ffmpeg binary and ensure paths are correct. The registry metadata should have listed this binary — request correction if it remains omitted. 6) Limit autonomous invocation if you are cautious. Autonomous execution is the platform default; if you want to reduce potential exposure until you audit the code, disable or restrict the skill's ability to run autonomously or run it only on-demand. 7) If you are not comfortable auditing the code yourself, ask the publisher for a short security/data-flow description: what endpoints are called, what data is sent, whether any logs or tokens are exfiltrated, and what minimal env vars are needed. Without that, treating this skill as 'runs code + networks + writes files' and running it in an isolated environment is the safest approach.

Like a lobster shell, security has layers — review code before you run it.

latestvk97e0shanxhwhj6cw7a48r2mc185jz0r

73downloads

0stars

5versions

Updated 1d ago

v0.3.1

MIT-0

Content Engine

Cross-platform content deconstruction + generation + knowledge graph. The bottom layer is a graph that grows over time; the top layer hosts multiple modes across multiple platforms.

中文版 / Chinese: see ~/.agents/skills/content-engine/

Mode Roadmap

Mode	Status	Description
deconstruct	✅ v1	Reference link → 18-field deconstruction card; feeds the graph along the way
generate	✅ v2.2	Deconstruction + graph + your brand → script / caption / cover / desc / tags / reference frames. v0.3.0: real video generation via Volcengine Ark Seedance 2.0 (sequential per-shot generation + ffmpeg auto-concat into final-video.mp4; partial-video.md tracks failed shots so you can re-run them). v0.2.1: built-in validator + auto-fallback v1.
evaluate	🔜 v3	Finished content → 8-dimension weighted scoring

Platform Roadmap

Platform	Status	Coverage / Plan
Xiaohongshu (XHS / RedNote)	✅ v1	Video posts + image posts
Douyin	🔜 v1.1	Short-form video (planned via TikHub Douyin API)
WeChat Channels (视频号)	🔜 v1.2	Short-form video
Bilibili	🔜 v2	Short + long-form video
TikTok / Instagram	🔜 Exploring	International platforms

Current state: this document covers Xiaohongshu deconstruct (v1) + generate (v2.2 text+image+video). v0.3.0 ships real video via Seedance 2.0. Future platforms reuse the same architecture (extract_{platform}.py / generate_{platform}.py + content_engine/{platform}/ submodules); the graph is shared cross-platform.

Platform compatibility: This skill runs in OpenClaw (as a personal agent skill) and Claude Code. Scripts use Python 3.10+ stdlib (no external deps); the only system command needed is ffmpeg. File I/O assumes your agent has Read / Write tools (in OpenClaw these map to apply_patch / Exec / Web browser).

Architecture: graph/ is the brain

content-engine-en/
├── SKILL.md              ← what you're reading; the agent's entry point
├── graph/                ← knowledge graph (shared "memory / soul / context" across modes)
│   ├── index.md              brand briefing, agent reads first
│   ├── brand/{brand-voice,brand-story}.md
│   ├── platforms/xiaohongshu.md     XHS playbook (only platform in v1)
│   │                                 (v1.1+ will add douyin.md / wechat-channels.md / ...)
│   ├── audience/segments.md          audience segmentation
│   └── engine/{hooks,style-tags,taboo}.md
├── references/{output-template,example-video,example-image}.md
└── scripts/
    ├── extract_xhs.py                v1 deconstruct CLI: link → workspace
    ├── generate_xhs.py               v2 generate CLI: link → script + images + copy
    │                                  (v1.1+ will add extract_douyin.py / generate_douyin.py)
    └── content_engine/               Python package (zero deps)
        ├── client.py                  TikhubClient (v1)
        ├── parsers.py                 NoteData / Comment parsing (v1)
        ├── linkresolve.py             short link → note_id (shared)
        ├── video.py                   download + ffmpeg frame extraction (v1)
        ├── images.py                  image post downloader (v1)
        ├── llm.py                     Ofox LLM client (v2)
        ├── nano_banana.py             Ofox image generation (v2, Nano Banana Pro)
        ├── lookup.py                  link → card lookup + freshness (v2)
        ├── prompts.py                 5 text prompt templates (v2)
        ├── generate.py                generate mode orchestration (v2)
        ├── preflight.py               environment self-check (v1+v2)
        └── models.py                  dataclass definitions

Two ironclad rules:

graph/ files can be empty templates — deconstruction still runs, just falls back to "objective deconstruction" mode
New hooks / style words discovered during deconstruction are auto-written back to graph/engine/. The graph grows.

When to trigger

User gives an XHS link and says "deconstruct this" / "study this" / "why did this go viral?"
Competitive analysis during content planning
The "research first" step before generating new content

Input

Required	Field	Notes
✅	Reference link	XHS short link / long link / 24-char hex note_id / share text — all accepted
	Task ID	Defaults to `AIC-{YYMMDD}-{seq}`
	Content goal	e.g., "drive in-store traffic" / "DM acquisition" — affects "Takeaways" field

Output

Markdown file → docs/deconstructions/{id}-{slug}.md.

Full field definitions in references/output-template.md. Examples in references/example-video.md / example-image.md.

Workflow

Step 0: Detect graph state → choose mode

Use your agent's native tools directly (avoid bash globstar / realpath compat issues):

Locate the skill root (where SKILL.md lives)
Use Read or Grep tools to scan graph/**/*.md for # TODO: markers

Simplest: a single Grep call:

Grep pattern: "^# TODO:"  path: <skill_root>/graph/  output_mode: files_with_matches

Files matched	Mode	Behavior
0	Brand-aware	Step 5's "Target audience" / "Takeaways" must be generated from graph content; field-fill stage must read the relevant graph nodes
≥1	Objective	Skip brand-aware fields; append to output: "⚠️ graph/ not yet populated — recommend filling {list of TODO files}"

Mode A vs B differences in the "Takeaways" field: see the dual-version comparison at the end of references/example-video.md.

Wikilink convention [[brand/brand-voice]] between graph files: this is Obsidian-style, pointing to graph/brand/brand-voice.md (no .md suffix). When you see [[X]], Read the corresponding file to load context.

Steps 1-3: One-shot data fetch → workspace

One command does it all: link resolution / metadata fetch / comments fetch / video download + frame extraction / image download for image posts.

⚠️ v1 supports Xiaohongshu only. Douyin / WeChat Channels / Bilibili etc. are on the roadmap (v1.1+) with corresponding extract_douyin.py / extract_wechat_channels.py scripts.

python3 scripts/extract_xhs.py "<XHS link / note_id / share text>"
# Default workspace: {tempdir}/content-engine/{note_id}/
# Custom: --out /your/path

First-run environment check:

python3 scripts/extract_xhs.py --check

(Checks Python version / ffmpeg / TIKHUB_API_TOKEN / network / workspace writable. See "Setup" section below.)

Workspace artifacts (default {tempdir}/content-engine/{note_id}/, cross-platform):

File	Content	How agent uses it
`note.json`	Parsed `NoteData` dataclass (all fields pre-extracted)	Read directly; maps to Step 5 field table
`comments.json`	Parsed `Comment` list (with `is_pinned` heuristic flag)	You (agent) read raw text in Step 5c and classify semantically — better than regex
`{note_id}.mp4`	Original video file (CDN direct download)	Used by Step 4 frame extraction
`frames/frame_NNN.png`	Extracted frames (auto fps based on duration: short <10s → 1.0, mid → 0.5, long >60s → 0.25)	Step 4 reads frame by frame
`images/image_NNN.jpg`	All images for image posts (numbered in order)	Step 4 reads image by image

Error handling:

API 401/403 → non-zero exit, tell user and stop
Comments API failure → comments.json written as {"_error": "..."}; "comment keywords" field becomes "⚠️ Not retrieved"
Non-XHS link → tell user "v1 only supports Xiaohongshu" and stop

Common flags:

--no-video skip video download (metadata-only mode)
--no-comments skip comments
--fps 1.0 force frame rate (default auto-adapts to duration)

Step 4: Multi-modal deconstruction

Step 4a · Required reading before deconstruction (graph hard gate)

Always Read first:

graph/platforms/xiaohongshu.md — sections "What to focus on when deconstructing" + "Platform viral formulas" + "Taboos"
graph/engine/style-tags.md — full style dictionary (Step 5 style tag field uses this)
graph/engine/hooks.md — full hook library (Step 5 emotion-hook field uses this)

If in brand-aware mode, also Read graph/brand/brand-voice.md + graph/brand/brand-story.md + graph/audience/segments.md.

Step 4b · Video branch (type == "video")

Read frames in order (frame_001.png ...). Mental-note for each frame: shot type / subject / action / background / props / camera direction. Don't output N rows of stream-of-consciousness — accumulate material for aggregation in next step.

Aggregate into time segments for the "Reference content deconstruction" field. Core rule:

✅ Good (aggregated + dense)	❌ Bad (stream of consciousness or empty)
7-12s ｜ Camera: locked → slow push ｜ Shot: close-up → extreme close-up<br>Visual: emerald-green collar and placket of vest, jade buttons + white beaded geometric embroidery, paired with white jade pendant necklace as styling demo	7s ｜ close-up ｜ collar<br>8s ｜ close-up ｜ collar<br>9s ｜ close-up ｜ button<br>...
	7-12s ｜ Visual: very pretty clothing detail, exquisite craftsmanship

Merge rules:

2+ consecutive frames with same subject/shot type → merge into one segment
Subject/shot change → start new segment
Single-frame holds <2s usually don't get their own segment
Use specific nouns (emerald green, beadwork, jade button) not adjective stacking (high-end, exquisite, beautiful)

Voiceover/subtitle text:
- Combine on-screen captions + note.json.desc
- Pure visual + no captions → write "No voiceover/subtitle, pure visual storytelling" + list bottom-watermark info
Voiceover logic analysis: write in layers, each layer with timestamp + one-line function:
- Example: "Layer 1 · Establish contrast and curiosity (0-12s): the '75-born + 2000m² store' numeric contrast triggers curiosity"
- Common structures:
  - Hook open → scene immersion → product/USP → identity elevation → CTA
  - Contrast open (number/conflict) → story setup → values → CTA
  - Craft close-up → cultural meaning → emotional resonance → tag elevation

Step 4c · Image branch (type == "normal")

Read images in order (images/image_NNN.jpg)
Each image: composition / elements / style / role (in the set: cover / detail / outfit / scene)
Aggregate into "Reference content deconstruction" by image order: "Image 1 (cover): ... / Image 2: ..."

Step 5: Extract remaining fields

Step 5a · Field-fill table (graph influence)

Field	Source	graph required reading
Platform	Link source	—
Target audience	`note.json.desc` + `hashtags` + comment behavior	Brand-aware mode: must cross-reference `graph/audience/segments.md` and explicitly mark which segment hit
Viral theme	`note.json.desc` + `title` + deconstruction	—
Style tags	Visuals + copy	Must cross-reference `graph/engine/style-tags.md` — mark "existing" if hit, "new" if not (and queue for Step 6 writeback)
Scene tags	Visuals	—
Emotion hook	Opening + hook lines	Must cross-reference `graph/engine/hooks.md` patterns; explicitly mark which class hit
Comment keywords	`comments.json` (you classify yourself)	See Step 5c — agent reads raw comments and classifies semantically; more accurate than regex
Voiceover logic analysis	Copy structure	—
Reference hashtags	`note.json.hashtags` (parser already cleaned `[话题]`)	Parser pre-extracted; just join with `#`; don't grep desc
Takeaways	Global summary	Strong dependency: brand-aware mode writes "how we'd do the same theme"; objective mode writes general principles

Step 5b · Quality bar for subjective fields

See "Field definitions + Anti-Pattern" section in references/output-template.md. Core principles:

Viral theme explains "why it went viral" (mechanism), not "what it is" (description)
Emotion hook writes "what technique elicits what emotion" (two-layer), not a single isolated word
Style vs Scene vs Emotion-hook: style = "how it looks/feels", scene = "where it happens", emotion-hook = "what it stirs in the user's mind" — never confuse the three

Step 5c · Comment keyword semantic classification (you do it, no regex)

Why no regex: language has infinite variations ("how do I buy" / "what's the price" / "is it pricey" / "how much"), regex always misses; regex also can't handle semantics ("price isn't a problem" isn't an inquiry; "isn't this silk?" isn't an objection). You (agent) have full language understanding — do this directly, you're 100x better than regex at this.

Data source: comments.json already filtered by parser — is_pinned=True (merchant-pinned / anti-scam) is auto-flagged and skippable; the rest are real user comments.

Four classes (by "what the user is doing"):

Class	What to capture	Examples
ask	Asking about purchase path / price / address / hours / channels (pre-conversion info)	"how do I buy" / "how much" / "where's the store" / "open hours" / "available online?"
request	Active need (strong intent)	"need WeChat" / "still in stock?" / "size out?" / "need contact"
praise	Resonance / specific likes	"so beautiful" / "want it" / "elegant" / "love it" / "tempting"
objection	Correction / disagreement (not neutral questions)	"please don't call this X" / "this is A not B" / "shouldn't be this expensive"

Output format (mandatory evidence):

- {keyword label} ({N} raw comments: "text 1" "text 2" "text 3") — {one-line interpretation / conversion signal judgment}

Hard anti-fabrication rules:

Each keyword must be backed by 1-3 original comment texts (copy directly from comments.json, no rewriting)
Keywords without raw text evidence are not allowed — no fabrication
Questions are not objections: "isn't this silk?" is a neutral question (goes to ask); "please don't call this 新中式" is an objection
Same comment can fall into multiple classes — "how do I buy this love the green" is both ask and praise; quote it under both
comments.json is [] or has _error → write "⚠️ Comment data not retrieved", don't infer from desc

Good vs Bad:

✅ Good (with evidence + interpretation):
- how-do-i-buy (5 raw comments: "how do I buy the green pants" "how to purchase, online?" "how do I buy this love the green") —
  highest-frequency conversion signal
- how-much / pricing (2 raw comments: "how do you sell this" "what's the price for this set") —
  another way of asking pricing
- objection-traditional-attire (1 raw comment: "this is Manchu attire, please don't call it 新中式" 👍 1) —
  only 1 comment but liked, signals tag-usage edge case

❌ Bad (no evidence / fabricated):
- how-do-i-buy, how-much, need-link (just listing words, no raw text — forbidden)

❌ Bad (misclassifying questions as objections):
- objection (comment: "is this silk?")  ← this is a question, not an objection

Step 6: Write back to graph (system gets smarter)

After deconstruction, proactively review and write back:

Strict writeback location rules:

Type	File	Insert location	Format
New hook	`graph/engine/hooks.md`	End of `## Pending classification` section	`### {emotion-class｜pattern-name}` H3 + bullets (pattern/适用/example/source)
New style tag	`graph/engine/style-tags.md`	End of `## Pending` table	`
Platform observation	`graph/platforms/xiaohongshu.md`	Top of `## Observation log` (newest first)	`### {YYYY-MM-DD} · {one-line topic}` + bullets (source/observation/data/inference)

Writeback principles:

Append-only, never overwrite
Every entry must include "source = task ID" + "date / data"
If conflict with existing graph entries → don't write; emit ⚠️ in output for human resolution
Hit existing hook/tag → don't duplicate; just mark "reuses existing graph entry" in deconstruction card

Step 6.5: Pre-output self-check (mandatory checklist)

Every line must ✓; failing one means you don't proceed to Step 7:

□ All 18 Excel fields filled, no skips
□ All 5 metadata items (author/time/engagement/note_id/type) pulled live from API, not fabricated
□ Style / Scene / Emotion-hook are not confused (see output-template.md)
□ Reference body copy is desc original (with emojis + line breaks), not paraphrased or trimmed
□ Each comment keyword backed by raw text from comments.json; if comments.json has `_error`, write "⚠️ Not retrieved"
□ Voiceover logic analysis is written in layers (hook/setup/elevation/CTA), not a single paragraph
□ Style tags hitting graph dictionary are marked "existing"; new ones marked "new"
□ Emotion hook hitting existing graph pattern is explicitly noted; new patterns queued for Step 6 writeback
□ Reference hashtags pulled directly from note.json.hashtags (parser already cleaned [话题])
□ Step 6 writeback: explicitly state "N items" or "none"; each item has source/date
□ Objective mode: append "graph/ not populated" notice at the end

Step 7: Publish deconstruction card

Step 7a · Generate slug

From title, generate filename / doc-name slug:

import re
slug = re.sub(r"[^\w一-龥\-_·]+", "-", title)[:30].strip("-") or "untitled"
# e.g., "Shenzhen 新中式｜what does wearing 江南春色 feel like" → "Shenzhen-..."

Final naming: {id}-{slug} (e.g., AIC-260426-001-Shenzhen-deep-dive).

Step 7b · Output (branches based on agent environment)

Preferred: Feishu (Lark) Docx (when running in OpenClaw with the Lark official plugin)

The OpenClaw Lark plugin gives the agent native tools to create cloud documents. In OpenClaw:

Use the Lark plugin's "create cloud doc" tool (exact tool name varies by plugin version), passing the full markdown content
Title is {id}-{slug}
Get the Feishu doc URL; record it for Step 7c

Fallback: local markdown (Claude Code / no Lark plugin / Lark tool failed)

# Use the Write tool to write to:
docs/deconstructions/{id}-{slug}.md

Decision logic:

Agent self-check: do I have a "create cloud doc" / Lark-document tool in my current session?
Yes → publish to Feishu, don't also write locally
No → write locally directly

Note: This skill does NOT wrap Feishu API. The OpenClaw Lark plugin handles auth / upload / conversion; the agent only needs to call the plugin's tools. Claude Code users who want Feishu publishing must manually copy the markdown into a Feishu doc.

Step 7c · User summary (fixed 4 lines)

1. Subject: {title / author / duration / engagement} — one line
2. Strongest insight: {1 core hook or counter-intuitive finding} ({data evidence, e.g., "save-to-like ratio 70%"})
3. Published to: {Feishu URL or local absolute path}
4. Graph writeback: {N items; list top 3, abbreviate rest; if 0, explicitly state "none"}

v2 Generate Mode (v0.2.0+)

Use the v1 deconstruction card as a competitor reference + your brand info → generate your own version of script / copy / reference frames / cover / tags.

When to trigger

User says something like:

"Based on https://xhslink.com/o/xxx, generate a same-theme video for our brand"
"Learn from this post and produce 8 image cards for us"
"This viral post has a great hook — make a version of it for our brand"

Input

Required	Field	Notes
✅	XHS link	User passes only the link; agent doesn't ask for filename
✅	--type	`video` / `image` / `script`
✅	--count	1-N (image count / video count)
	--product-imgs	Path to product images (dir or single file) — text-only in v2.0; image-to-image in v2.1
	--product-usp	Free-text USP / material / craft description
	--fresh	Force re-deconstruct v1 (bypass cache)

Output workspace

docs/deconstructions/AIC-260426-001-xxx-generated/    ← v1 card name + "-generated"
└── GEN-260427-001-image/                            ← one GEN-N per generate run
    ├── script.md                                    ← full script (image plan / video shots / shoot brief)
    ├── caption.txt                                  ← (video type) on-screen captions
    ├── cover.png + cover.txt                        ← cover image (with overlay text) + text backup
    ├── frames/frame_NNN.png                         ← N reference images (Nano Banana, vertical 9:16)
    ├── desc.txt                                     ← XHS post body
    ├── tags.txt                                     ← hashtags (10-15)
    ├── seedance-prompt.md                           ← (video type) Seedance cinema-style prompt
    ├── shots/shot_NN.mp4                            ← (video, v0.3.0) Per-shot real videos from Seedance
    ├── final-video.mp4                              ← (video, v0.3.0) ffmpeg-concatenated final video
    └── partial-video.md                             ← (video, v0.3.0) Per-shot status + failed-shot prompts

Workflow (10 steps)

Step 0: Preflight + mode select

Check OFOX_API_KEY (required) + TIKHUB_API_TOKEN (for fallback v1 deconstruct)
Check graph state (brand-aware vs objective mode)

Step 1: Link → deconstruction card

Resolve link → note_id (reuse v1 linkresolve)
Grep docs/deconstructions/ for note_id
Found (≤7 days) → use directly
Found (>7 days) → ask user "reuse / re-deconstruct?"
Not found → auto-fallback (since v0.2.1): transparently runs extract_xhs.py to fetch note.json + comments.json + frames, then writes a stub deconstruction card (text fields populated; visual fields marked ⚠️ AUTO-STUB for the agent to complete by reading frames/)

Step 2: Read graph context

brand-voice / brand-story / segments / taboo / hooks / style-tags / xiaohongshu

Step 3: Collect input args

type / count / product-imgs / product-usp

Step 4: Generate script (core)

Prompt template: deconstruction + brand-voice + hooks library + USP
Output: script.md
Critical constraint for image type: each frame MUST be a single isolated subject (prevents downstream image gen from producing collages)

Step 5: Parallel generate 4 ancillary text

caption.txt (video only)
cover.txt
desc.txt
tags.txt (depends on desc, runs after)

Step 6: Image generation (image / video types)

N frames (per --count for image; 1 key frame for video) + 1 cover
Nano Banana three-path constraint prompt:
1. Layout reference ← from script.md single-frame description
2. Brand style anchors ← from graph/brand/brand-voice
3. Product description ← from --product-usp + --product-imgs
Hard rules baked into build_prompt:
- STRICTLY VERTICAL 9:16 portrait
- SINGLE IMAGE only, NO collage / grid / multi-panel
- frame: ABSOLUTELY NO text; cover: text overlay allowed

Step 7: seedance-prompt.md (video type only)

LLM translates script into Seedance cinema-style prompt (5-6 shots, 4-7s each)
From v0.3.0, this file is also auto-fed into Seedance API (unless --no-real-video)

Step 7.5: Real video generation (v0.3.0+, video type, default on)

Parse seedance-prompt.md into N shots
Print cost estimate + 3-second Ctrl+C countdown (1 shot 5s ≈ $0.20, 5 shots ≈ $1)
Submit shots sequentially to Volcengine Ark Seedance 2.0 (async task + polling, ~1-3 min per shot)
Failed shots don't block other shots; partial-video.md records which failed + their prompts for manual re-run
Successful shots are stitched with ffmpeg concat into final-video.mp4
Flags: --no-real-video (prompt-only, skip API) / --async (submit only, return task_ids) / --no-confirm (skip 3s countdown)

Step 8: validator (built-in since v0.2.1)

Hard errors → auto-retry the relevant step (max 1 attempt): taboo word hits / empty file / tags < 5 / image too small (suspected gen failure)
Soft errors → emit quality_report.md for the user to decide: desc length anomaly / too many emoji / multi-line cover
Taboo dictionary auto-extracted from graph/engine/taboo.md, layered on top of default extreme/marketing words

Step 9: Publish

Reuses v1 Step 7: Feishu first + local fallback
Returns the GEN-xxx directory path as the deliverable

Step 10: 4-line summary

1. Generated: based on {card} + {brand}, {type} ({count} items)
2. Outputs: script.md + cover + N frames + desc + tags + (seedance-prompt)
3. Workspace: {absolute path}
4. Time / calls: {seconds} / {LLM calls} + {image calls}

CLI usage

# 8 reference images for an image post
python3 scripts/generate_xhs.py "<XHS link>" --type image --count 8 \
  --product-usp "Premium knitwear: silk vest + embroidered shirt" \
  --product-imgs ~/photos/spring-2026/

# 1 video (v0.3.0+: real video gen by default, ~$1, Ctrl+C to cancel)
python3 scripts/generate_xhs.py "<XHS link>" --type video --count 1

# 1 video, prompt only (script + cover + 1 key frame, no Seedance API call)
python3 scripts/generate_xhs.py "<XHS link>" --type video --count 1 --no-real-video

# Async: submit Seedance tasks and return immediately with task_ids
python3 scripts/generate_xhs.py "<XHS link>" --type video --count 1 --async

# Shoot brief only
python3 scripts/generate_xhs.py "<XHS link>" --type script --count 1

# Force re-deconstruct (bypass cache)
python3 scripts/generate_xhs.py "<XHS link>" --type image --count 8 --fresh

# Environment check (includes OFOX_API_KEY)
python3 scripts/generate_xhs.py --check

Known limitations / roadmap

Limitation	Solution direction	Plan
~~Video not really generated (prompt only)~~	~~Integrate Volcengine Ark Seedance 2.0 API~~	✅ v0.3.0
No hook variants (1 set per run)	LLM multi-round with N hook directions	v0.4.x
Weak character consistency (different faces)	IP-Adapter / InstantID	v0.5.x
~~No automatic QA~~	~~validator.py with hard+soft error detection~~	✅ v0.2.1
~~Fallback v1 deconstruct is manual~~	~~Auto-trigger built in~~	✅ v0.2.1
Stub card visual fields filled by agent manually	Vision LLM auto-completion	v0.4.0

Boundaries (generate mode)

No fabricated product info: if user provides no product images / USPs → prompt explicitly notes "user did not supply visual reference"; LLM avoids inventing concrete colors/materials
No copy-paste from competitor: script must not contain reference video's specific proper nouns (brand / founder / location)
Images are reference, not finals: v2.0's image generation is a mood board / shooting reference, not direct-publish assets (see spec §1)
Brand consistency uses three-path constraint: product image + brand-voice prompt + deconstruction layout — any path missing is OK (degrades but doesn't block)
Ofox calls are metered: each generate ~4-7 LLM + N+1 image calls; recommend --count 1 first to verify before scaling up

v1 boundaries (deconstruct mode)

No fabrication: API failure / video download failure / unrecognizable subtitles → mark "Not retrieved"
Deconstruction is observation, not commentary: factual fields write "the visual shows X", not "this looks great". Subjective judgment only in three fields: emotion-hook / viral theme / takeaways
Graph is append-only: writebacks don't overwrite; conflicts get ⚠️ for human resolution
Token control: when video frames > 30, aggregate by time segments (1 representative per 5s) before detailed description
No content generation here: deconstruct mode only outputs cards + graph writeback (v2 generate handles generation)

Setup

System requirements

Dependency	Purpose	Install
Python ≥ 3.10	Run all scripts	macOS: `brew install python@3.12`<br>Linux: `apt install python3.12` or pyenv<br>Windows: python.org
ffmpeg	Video frame extraction (optional for image-only)	macOS: `brew install ffmpeg`<br>Linux: `apt install ffmpeg` (or dnf / pacman)<br>Windows: `choco install ffmpeg`

No pip dependencies — scripts use Python stdlib only.

API tokens

v1 deconstruct needs TIKHUB_API_TOKEN (required for deconstruct) v2 generate needs OFOX_API_KEY (required for generate; covers LLM + Nano Banana image gen) v0.3.0+ real video needs ARK_API_KEY (required when video type runs in default mode; --no-real-video bypasses)

# v1 deconstruct: TikHub
mkdir -p ~/.config/content-engine
echo 'TIKHUB_API_TOKEN=your_tikhub_token' >> ~/.config/content-engine/.env

# v2 generate: Ofox (LLM text + Nano Banana images)
echo 'OFOX_API_KEY=ofox-your_key' >> ~/.config/content-engine/.env

# v0.3.0+ video generation: Volcengine Ark (Seedance 2.0)
echo 'ARK_API_KEY=your_ark_key' >> ~/.config/content-engine/.env

Token	Sign up	Use	Required?
`TIKHUB_API_TOKEN`	tikhub.io	XHS API (raw deconstruct data)	v1 deconstruct
`OFOX_API_KEY`	ofox.ai	LLM + Nano Banana images	v2 generate
`ARK_API_KEY`	Volcengine Ark Console	Seedance 2.0 video generation	required for v0.3.0+ real video; `--no-real-video` bypasses
`OPENROUTER_API_KEY`	openrouter.ai	(optional) alternate LLM provider	optional

⚠️ The ARK API Key is not the same as a Volcengine IAM AK/SK (both are UUID-shaped but use different auth). Create it under "API Key Management" in the Ark console, then enable Doubao-Seedance-2.0-fast under "Activation → Vision Models" (default 5M tokens free).

Switch model: export ARK_VIDEO_MODEL=doubao-seedance-1-5-pro-251215 (or any other Ark model id).

Token lookup order (first found wins):

Corresponding env var (TIKHUB_API_TOKEN / OFOX_API_KEY / ARK_API_KEY / OPENROUTER_API_KEY)
$CWD/.env
~/.config/content-engine/.env (XDG standard)
Skill-root .env

Verify

python3 scripts/extract_xhs.py --check

Reports each check ✅/❌/⚠️ with fix instructions.

Mainland China users

Main domain api.tikhub.io requires a proxy from inside China
Mirror: api.tikhub.dev (no proxy needed) — set TIKHUB_BASE_URL=https://api.tikhub.dev in .env

Feishu / Lark publishing (OpenClaw users only)

This skill does not bundle Feishu API code. To enable auto-publishing of deconstruction cards to Feishu Docx, install the OpenClaw Lark official plugin:

npx -y @larksuite/openclaw-lark install

Details: OpenClaw Lark official plugin docs

Once installed:

Agent in OpenClaw gains native "create cloud doc / read cloud doc / update cloud doc" tools
SKILL.md Step 7 will direct the agent to use those tools for publishing
Credentials are managed by the plugin; this skill needs zero Feishu config

Claude Code or other environments: deconstruction cards save to local docs/deconstructions/; copy to Feishu manually if needed.

Related / Credits

analyze-xhs skill: account-level analysis (not single-post)
Architecture inspiration: Ronin · How To Build Own Content Engine (the Skill Graph idea: use .md + wikilinks as the agent's "memory / soul")
Chinese version: ~/.agents/skills/content-engine/

Comments

Loading comments...