Family Soul Analyzer

v0.1.0

从家庭群聊记录（微信/WhatsApp/其他）提炼数字人格。输出 soul.md（集体人格）+ 每位成员的 persona 文件，可直接用于 AI agent 的人格底座。关键词：群聊分析、家庭人格、soul、persona、数字人格、聊天记录、微信导出、人格提炼。

⭐ 0· 114·0 current·0 all-time

by@zengury·duplicate of @zengury/family-soul

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for zengury/family-soul-analyzer.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Family Soul Analyzer" (zengury/family-soul-analyzer) from ClawHub.
Skill page: https://clawhub.ai/zengury/family-soul-analyzer
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install family-soul-analyzer

ClawHub CLI

Package manager switcher

npx clawhub@latest install family-soul-analyzer

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

ℹ

Purpose & Capability

Overall the code and SKILL.md align with the described purpose (WeChat/CSV parsing → denoise → LLM extraction → synthesis). However the repo includes multiple alternative LLM backends (Anthropic/Claude, Kimi/Moonshot/OpenClaw) beyond the single provider named in SKILL.md, which is plausible but expands the skill's network footprint. Also the package bundles an actual exported chat JSON in data/raw/, which is unexpected for a reusable skill and raises privacy concerns.

Instruction Scope

SKILL.md instructs the agent to run the included pipeline scripts and to confirm ANTHROPIC_API_KEY; the scripts will read user-provided chat files and then transmit chunked chat text to third‑party LLM APIs. This is consistent with its purpose but the instructions (and code) will upload sensitive family chat content to external services — an explicit privacy/data-exfiltration risk the user must accept. The skill's runtime also writes cached API responses to disk (raw_cache.jsonl), increasing persistence of derived sensitive data.

ℹ

Install Mechanism

There is no install spec beyond a requirements.txt and SKILL.md instructions — the skill is instruction-plus-code. That is lower risk than arbitrary remote downloads, but the presence of runnable Python scripts (and a requirements file) means the agent will execute code from this bundle. No external install URL was used, which avoids direct supply-chain download risks, but users should still vet requirements and run in an isolated environment.

Credentials

Registry metadata lists no required env vars, but the SKILL.md and multiple scripts expect ANTHROPIC_API_KEY (Claude) and optionally KIMI_API_KEY / MOONSHOT_API_KEY. Worse: several scripts contain a hard-coded API key string (e.g. 'sk-kimi-Sgsy7YYJPrk...') and hard-coded base URLs. Hard-coded credentials in published code are a major red flag (they may be leaked/stale/unauthorized) and the mismatch between declared requirements and actual env requirements is an incoherence to surface to users.

Persistence & Privilege

The skill writes outputs and caches to disk (soul.md, persona_*.md, data/observations/raw_cache.jsonl, observations.jsonl). That behavior is expected for a pipeline but means sensitive inputs and raw LLM responses are stored locally in the skill directory by default. The skill does not request elevated system privileges or set always:true, but the combination of autonomous invocation (normal default) plus persistent caches increases the blast radius if misconfigured.

Scan Findings in Context

[hardcoded_api_key] unexpected: Multiple files include a hard-coded API key string (pipeline/03_extract_kimi_openclaw.py, pipeline/03_extract_kimi_v2.py, pipeline/03_extract_simple.py contain 'sk-kimi-...'). Even if these are placeholders, embedding secrets in code is unsafe and not necessary for the stated purpose.

[undeclared_env_vars] unexpected: Registry metadata lists no required environment variables, but SKILL.md and code require ANTHROPIC_API_KEY and optionally KIMI_API_KEY/MOONSHOT_API_KEY. This mismatch is an incoherence between declared requirements and actual runtime needs.

[sensitive_sample_data] unexpected: The repo bundles a real chat export under data/raw/ (群聊_修身，齐家，尝小烹@深圳.json). Packaging identifiable family chat transcripts with the skill is unexpected and introduces a privacy liability.

[third_party_api_calls] expected: The skill legitimately calls remote LLM APIs (Anthropic/Claude, Kimi/Moonshot/OpenAI-compatible endpoints) to perform extraction and synthesis; this is expected for its function but has privacy/cost implications.

What to consider before installing

Key things to consider before installing or running this skill: - Privacy: the skill uploads chat content to external LLM APIs. Only run it on data you own or have explicit consent to process. The package itself includes an example exported chat JSON (data/raw/...), which may contain real people’s messages — remove or inspect it before use. - Credentials: SKILL.md and the scripts expect ANTHROPIC_API_KEY and optionally KIMI/MOONSHOT keys, but the registry entry lists none. Do not use any hard-coded API key found in the code. Replace or remove hard-coded keys and set your own keys as environment variables. - Hard-coded secrets: the repo contains embedded API key-like strings. Treat them as compromised/unauthorized; remove them and audit where keys are used. Do not rely on those keys for production. - Data persistence: the pipeline caches raw API responses (raw_cache.jsonl) and writes outputs to the skill directory. If you run it, run in an isolated directory or container and clean caches after use if you do not want local persistence. - Run safely: review requirements.txt and the Python scripts before execution. If possible, run first on synthetic/dummy data to verify behavior and network calls. Consider running in a sandboxed environment (container) and monitor outbound network requests. - Consent and legality: extracting 'personas' from family chat may implicate privacy laws or consent obligations — ensure you have permission from chat participants. If you want, I can: (1) list the exact files and lines where hard-coded keys appear; (2) suggest minimal code edits to remove embedded keys and stop caching raw responses; or (3) provide a safe checklist to run the skill in a sandbox.

Like a lobster shell, security has layers — review code before you run it.

latestvk972g4fdjwc7c6kg7605j79xcd83g51c

114downloads

0stars

1versions

Updated 1mo ago

v0.1.0

MIT-0

SKILL: Soul Forge — 家庭数字人格提炼

这个 skill 把一份家庭群聊记录变成可用于 AI agent 的人格文件。基于数字民族志方法论：用 AI 完成「田野调查」→「人格合成」的完整流程。

触发条件

以下情况触发此 skill：

用户说"帮我分析聊天记录"、"生成 soul 文件"、"提炼家庭人格"
用户提供了 .json 聊天导出文件
用户说"运行 soul-forge"、"开始人格提炼"
用户问"怎么用聊天记录生成 persona"

执行流程

第一步：确认输入

询问用户：

聊天记录文件路径（支持微信 WeFlow 导出的 JSON 格式）
输出目录（默认：~/soul-forge-output/）
家庭成员角色配置（默认：dad/mom/child 三人结构）

确认 ANTHROPIC_API_KEY 已设置（需要调用 Claude API）。

第二步：后台运行 pipeline

调用：

python3 {SKILL_DIR}/scripts/run_forge.py --file {用户提供的文件路径}

四个阶段，agent 依次推进：

阶段	脚本	说明	预计时间
1	`01_parse.py`	解析原始聊天 JSON → 标准化消息	30秒
2	`02_denoise.py`	去噪、按时间分块	1分钟
3	`03_extract.py`	Claude Haiku 批量提取行为模式（Batches API）	10-30分钟
4	`04_synthesize.py`	Claude Opus 综合生成 soul.md + persona	5-15分钟

阶段3说明：使用 Batches API 异步处理，成本低，自动缓存进度。如被中断可用 --resume 恢复，不重复计费。

第三步：进度汇报

解析 run_forge.py 的标记输出：

[STAGE:N:START] → 告知用户"正在进行阶段N"
[STAGE:N:DONE] → 告知用户"阶段N完成"
[PROGRESS:N/M] → 展示进度条
[OUTPUT:path] → 列出生成的文件
[ERROR:msg] → 报告错误，建议用户如何处理
[DONE] → 宣布完成，展示所有输出文件

第四步：完成后

输出文件说明：

soul-forge-output/
├── soul.md          ← 集体人格，可直接作为 AI agent SOUL.md 使用
├── persona_dad.md   ← 爸爸个人人格
├── persona_mom.md   ← 妈妈个人人格
└── persona_child.md ← 孩子/子女人格

询问用户是否要：

将 soul.md 安装为当前 agent 的 SOUL.md
为每个 persona 创建独立 agent

进阶用法

只更新 soul，不重新生成 persona

告诉 agent：「soul-forge 只更新 soul，跳过 persona」

内部：python3 run_forge.py --file {path} --soul-only

只重新生成 persona（soul 已存在）

告诉 agent：「soul-forge 只刷新 persona」

内部：python3 run_forge.py --file {path} --persona-only

从中断处恢复

告诉 agent：「soul-forge 继续上次的任务」

内部：python3 run_forge.py --resume

查看当前进度

告诉 agent：「soul-forge 状态」

内部：python3 run_forge.py --status

支持的输入格式

格式	来源	说明
微信 WeFlow JSON	WeFlow 工具导出	完整支持
标准 CSV	自定义导出	需包含 sender/timestamp/content 列

微信导出方法：用 WeFlow（Mac）→ 选群聊 → 导出 JSON 格式。

成本估算

一份 2-3 年的家庭群聊（~500 块对话）：

阶段3（Haiku Batches）：约 $0.5-1.0
阶段4（Opus）：约 $2-5
合计约 $3-6，一次性

常见问题

Q: 阶段3 很慢怎么办？ A: Batches API 通常 10-30 分钟，这是正常的。agent 会持续轮询状态，不需要人工干预。

Q: 中途断了怎么办？ A: 说「soul-forge 继续」，脚本会从断点恢复，已完成的阶段不会重复执行。

Q: API key 在哪里设置？ A: export ANTHROPIC_API_KEY='sk-ant-...'，或在 OpenClaw 的环境变量设置里配置。

Q: 支持几个人的群聊？ A: 默认三人（dad/mom/child），可在 pipeline/config.py 修改角色配置。

方法论背景

基于数字民族志（Digital Ethnography）：

阶段1-2：田野记录整理（去噪、结构化）
阶段3：系统性观察（Haiku 提取五维度行为模式）
阶段4：民族志分析（Opus 综合「厚描」）

Clifford Geertz：「浅描记录行为，厚描解释意义。」

soul.md 是厚描的产物——不是行为清单，而是理解这个家庭需要什么样的解释框架。

Comments

Loading comments...