Dream Self-improving — 夜间记忆蒸馏与自我进化
🧠 v4.x 已集成 Long-Term RAG — MetaGPT风格短长记忆合并,条目老化后自动晋升到 RAG 层
现状
Phase 3(✅ 已实现): OpenClaw Hook hippocampus 监听每条消息,实时写入 memory/logs/
Phase 3(✅ 已实现): dream.py v4.x 定时蒸馏 + M-FLOW Bundle Search检索
Phase 3(✅ 已实现): dream.py v4.x Long-Term RAG 长记忆层
核心升级:Long-Term RAG
参考 MetaGPT 的 RoleZeroLongTermMemory 设计,新增短长记忆合并机制:
short-term-recall.json ← 活跃recall条目(上限200条)
memory/.rag/longterm.jsonl ← 老旧条目RAG存储
晋升条件:
- 条目 age > 30天(从最后召回时间算)
- 且 recallCount < 3(未被频繁召回)
召回流程:
- 蒸馏前,从当日高权重条目提取关键词
- 用关键词查询 RAG,召回相关旧记忆
- 旧记忆注入蒸馏上下文,让 AI 知道"之前有过什么"
效果: 记忆越来越精准,不像以前每次都从零开始。
完整链路
用户对话
↓
OpenClaw Hook: message:preprocessed
↓
丘脑过滤(Thalamus)→ 杏仁核标记(Amygdala)→ 海马体存储(memory/logs/)
↓
cron 触发(早7点/晚10点)
dream.py v4.x
↓
[4.5] RAG查询 — 从当日条目提取关键词 → 查询memory/.rag/longterm.jsonl → 注入蒸馏上下文
↓
Bundle Search检索(替代简单grep)
↓
杏仁核标记融合 → Auditor审计 → 分析皮层模式识别 → 前额叶蒸馏规划
↓
[4.6] RAG晋升 — 30天+未召回条目 → 写入longterm.jsonl
↓
归档区 → 真相文件写回 → 梦境报告
M-FLOW 核心架构
倒锥知识图谱(Inverted Cone)
所有记忆组织为四层有向图,形成倒锥结构:
锥尖(容易精确命中)
↓
┌─────────────────────────┐
│ L4 Entity │ ← 用户/项目/系统等实体节点
│ L3 FacetPoint │ ← 具体属性、特征、标签
│ L2 Facet │ ← 一组相关特征
│ L1 Episode(锥底) │ ← 最终返回的知识单元
└─────────────────────────┘
锥底(返回给用户)
搜索逻辑(Bundle Search):
- 锥尖广撒网:查询向量化后同时在4层搜索,每个集合返回最多100个候选
- 投影到图中:命中点作为入口,提取周围子图(边+邻居+连接关系)
- 代价传播:沿边从锥尖向锥底传播,Episode得分 = 所有路径中最小代价
三条核心设计原则:
| 原则 | 说明 | 对应效果 |
|---|
| 边携带语义 | 每条边附带自然语言描述,参与检索 | 不是被动连接,是主动语义过滤器 |
| 路径最小代价 | 一条强证据链就足以证明相关性 | 不被无关路径稀释分数 |
| 惩罚直接命中Episode | 直接匹配摘要反而加惩罚 | 偏好精准锚点路径,防止宽泛匹配 |
脑区协同架构
① 丘脑(Thalamus)— 注意力门控
过滤纯问候/简单确认,只记录有意义的事件
标记类型:event / decision / correction / completed / insight / error
② 杏仁核(Amygdala)— 情绪标记
correction/error/decision/completed/insight 携带 HIGH 权重,优先蒸馏
③ 海马体(Hippocampus)— M-FLOW图存储 + RAG
Phase 1:memory/logs/ 追加日志(Episode层)
Phase 2:构建M-FLOW图结构:
Episode (L1) ← daily log / topic file
↓ semantic edge
Facet (L2) ← grouping: correction_group, project_xxx
↓ semantic edge
FacetPoint (L3) ← specific tag: error.timeout, user.pref
↓ semantic edge
Entity (L4) ← user, project, tool, skill
FacetPoint = type + topic + keywords 的向量描述(向量化后参与Bundle Search)
语义边描述 = "这个FacetPoint为什么属于这个Episode" 的自然语言说明
④ 前额叶(Prefrontal Cortex)— Bundle Search + RAG召回 + 蒸馏规划
Bundle Search检索替代简单grep:
查询 → 向量化 → 4层锥形搜索 → 代价传播 → 最小路径Episode
RAG召回(v4.x新增):
当日关键词 → 查询longterm.jsonl → 召回相关旧记忆 → 注入蒸馏上下文
⑤ 蓝斑核(Locus Coeruleus)— 警觉与新鲜度信号
freshness分数——最近被提及的记忆权重更高
Long-Term RAG Layer 详解
存储结构
memory/
├── .dreams/
│ └── short-term-recall.json # 活跃recall条目(上限200条)
└── .rag/
└── longterm.jsonl # 老旧条目RAG存储(JSONL格式)
晋升机制
# 晋升条件
if age_days > 30 and recall_count < 3:
promote_to_longterm_rag(entry)
召回机制
# 蒸馏前
keywords = [v['snippet'][:100] for v in tagged.values()][:20]
query = ' '.join(keywords[:5])
rag_results = query_longterm_rag(query, k=5)
# 召回结果注入蒸馏上下文
learnings['LEARNINGS.md'] += f"\n\n## Long-Term Memory (RAG)\n{rag_text}"
手动命令
# 查看短/长记忆状态
python skills/dream-selfimproving/scripts/longterm_rag.py --status
# 手动晋升老条目
python skills/dream-selfimproving/scripts/longterm_rag.py --promote
# 搜索长记忆
python skills/dream-selfimproving/scripts/longterm_rag.py --query "关键词"
Pattern Library
Patterns are reusable response templates extracted from recurring learnings:
memory/patterns/
└── p-xxx.md # Pattern files with trigger + response
Pattern格式(含M-FLOW元数据):
---
name: pattern名称
trigger: 什么情况下触发
response: 如何响应
examples: [案例1, 案例2]
created: YYYY-MM-DD
updated: YYYY-MM-DD
# M-FLOW 元数据
entity: pattern # L4 Entity
facets: [tag1, tag2] # L3 FacetPoints
episode_id: p-xxx # L1 Episode
---
Memory Taxonomy & M-FLOW映射
| Memory Type | L4 Entity | L3 FacetPoints | L1 Episode |
|---|
| user | user.luyi | role, pref, goal, communication_style | topics/user_*.md |
| feedback | feedback | correction, error, insight, confirmation | topics/feedback_*.md |
| project | project.{name} | decision, tool, deadline, context | topics/project_*.md |
| reference | reference | credential, link, skill, system | topics/reference_*.md |
| longterm | (RAG) | aged, promoted | .rag/longterm.jsonl |
Directory Structure (v4.x)
memory/
├── graph/ # M-FLOW 知识图谱
│ ├── entities.json # L4 Entity 节点列表
│ ├── facetpoints.json # L3 FacetPoint 节点列表
│ ├── facets.json # L2 Facet 节点列表
│ ├── episodes.json # L1 Episode 节点列表
│ ├── edges.json # 语义边(含描述文本)
│ └── index.json # 图索引 + 向量锚点
├── logs/
│ └── YYYY/MM/YYYY-MM-DD.md # Daily append-only logs (Episode)
├── topics/ # Distilled topic memories
│ ├── user_xxx.md
│ ├── feedback_xxx.md
│ ├── project_xxx.md
│ └── reference_xxx.md
├── patterns/ # Pattern Library
│ └── p-xxx.md
├── episodes/ # Project narratives
├── .dreams/
│ └── short-term-recall.json # 活跃recall条目(上限200条)
├── .rag/
│ └── longterm.jsonl # Long-Term RAG(v4.x新增)
├── procedures.md # Workflow preferences
├── archive.md # Compressed old entries
├── dream-log.md # Dream cycle reports
└── MEMORY.md # INDEX only
.learnings/ # self-improving-agent
├── LEARNINGS.md
├── ERRORS.md
└── FEATURE_REQUESTS.md
Health Score (v4.x)
| Metric | Weight | Formula |
|---|
| Freshness | 0.20 | entries_referenced_last_30_days / total |
| Coverage | 0.20 | categories_updated_last_14_days / 10 |
| Coherence | 0.20 | entries_with_semantic_edges / total |
| Graph Connectivity | 0.20 | connected_components_ratio |
| Efficiency | 0.10 | max(0, 1 - line_count/500) |
| Reachability | 0.10 | Bundle Search路径覆盖率 |
Dream Distillation Steps (v4.x)
When cron triggers:
- Bundle Search预热:用今日日志构建临时图结构,快速验证图连通性
- Read
memory/logs/{date}.md
- Read
.learnings/LEARNINGS.md, .learnings/ERRORS.md, .learnings/FEATURE_REQUESTS.md
- Read
MEMORY.md, topic files, graph/index.json, procedures.md for context
- Snapshot BEFORE: count entries, decisions, lessons, procedures
- [4.5] RAG召回:从当日条目提取关键词 → 查询longterm.jsonl → 注入蒸馏上下文
- 图增强检索:对每个learnings entry执行Bundle Search,找到相关Episode
- Distillation Agent: Run sub-agent on raw entries + learnings + RAG results → produce:
- 3-5 genuine insights ("I learned that...")
- 1-3 tomorrow action items
- 0-3 topic files to write to
memory/topics/
- Health metric interpretation
- [4.6] RAG晋升:30天+未召回条目 → 写入longterm.jsonl
- 更新图结构:
- 新Episode写入
graph/episodes.json
- 新FacetPoint写入
graph/facetpoints.json
- 新边写入
graph/edges.json(含语义描述)
- Write topic files (from Distillation Agent output)
- Update truth files (
user_state.md, pending.md)
- Update
graph/index.json entry metadata + 重新计算向量锚点
- Compute health metrics → update
graph/index.json stats
- Archive eligible entries → append to
archive.md
- Update
MEMORY.md index (max 200 lines)
- Snapshot AFTER: calculate deltas
- Write dream report to
memory/dreams/{date}.md and dream-log.md
- [Optional SwarmRecall]: 如果配置了API key,执行云端图同步
Dream Report Format (v4.x)
# 🌙 Dream Report — {date}
## M-FLOW Graph Status
- Entities: N | FacetPoints: N | Episodes: N | Edges: N
- Graph Connectivity: {score}% | Avg Path Cost: {cost:.3f}
## RAG Status
- Short-term recall: N 条 | Long-term: M 条
- Promoted this cycle: N 条
## Health Insights
- {insight based on graph connectivity / Bundle Search coverage}
## Insights ("I Learned")
- {genuine insight 1}
- {genuine insight 2}
## Tomorrow's Focus
- {actionable item 1}
- {actionable item 2}
## Topic Files Written
- {filename}: {title}
## Graph Updates
- New episodes: N
- New semantic edges: N
- Pruned nodes: N
## Analysis
- Recurring errors found: {list}
- Root causes identified: {analysis}
- Bundle Search paths evaluated: {count}
## Patterns Updated
- {pattern_name}: {change}
User Prompts
- "dream report" / "梦境报告" → read and display latest dream report
- "dream" / "做梦" → run distillation now
- "/dream status" → show M-FLOW graph stats, health score, pattern count
- "/dream search {query}" → run Bundle Search and show top results
- "/dream rag status" → show RAG status (from longterm_rag.py)
Scripts
dream.py — Phase 2 蒸馏脚本(v4.x,M-FLOW Bundle Search + RAG召回/晋升)
update-cron-date.py — 每日 cron 日期注入
graph-builder.py — 从日志构建M-FLOW图结构
bundle-search.py — Bundle Search检索实现
longterm_rag.py — Long-Term RAG 管理脚本(v4.x新增)
Phase 1 启用(hippocampus hook)
Hook 目录: ~/.openclaw/hooks/hippocampus/
已配置: openclaw.json 中 hooks.internal.entries.hippocampus: enabled: true
功能: 监听 message:preprocessed 事件,自动记录对话到 memory/logs/YYYY/MM/YYYY-MM-DD.md
丘脑过滤规则:
- 纯问候 / 简单确认(<20字)不记录
- 高权重标记:correction / error / decision / completed / insight
重启 gateway 后生效:
schtasks /run /tn "OpenClaw Gateway"
M-FLOW vs 旧架构对比
| 维度 | 旧架构(平坦检索) | M-FLOW(倒锥图路由) |
|---|
| 存储结构 | 平面文件列表 | 四层有向图 |
| 检索方式 | grep / 向量相似度 | Bundle Search代价传播 |
| 关系表示 | 简单link引用 | 带语义描述的边 |
| 短长记忆 | 无分层 | 30天老化晋升RAG |
与MetaGPT对比
| 维度 | MetaGPT RoleZeroLongTermMemory | Dream Long-Term RAG |
|---|
| RAG引擎 | Chroma + LLMRanker | JSONL + 关键词匹配 |
| 召回触发 | memory_k 溢出 或 用户需求 | 每次蒸馏前 |
| 晋升条件 | count > memory_k | age > 30天 且 recallCount < 3 |
| 向量化 | embedding 模型 | 词袋模型(简化版) |
| 复杂度 | 依赖 Chroma/llama-index | 纯 Python,无外部依赖 |