Prompt Hardening

v1.0.0

硬化 agent prompt、system prompt、SOUL.md、AGENTS.md、cron prompt 使 LLM 可靠遵循指令。触发词:agent 不听话、忽略规则、绕过约束、prompt 优化、指令合规、规则强化、prompt 硬化、LLM 不遵守、模型违规、creative circumve...

0· 83·1 current·1 all-time
by_silhouette@lanyasheng

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lanyasheng/prompt-hardening.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Prompt Hardening" (lanyasheng/prompt-hardening) from ClawHub.
Skill page: https://clawhub.ai/lanyasheng/prompt-hardening
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install prompt-hardening

ClawHub CLI

Package manager switcher

npx clawhub@latest install prompt-hardening
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (prompt hardening) match the provided artifacts: SKILL.md documents 16 hardening patterns, references, a simple audit script, and a smoke test. There are no env vars, binaries, or installs that are unrelated to auditing/rewriting prompts.
Instruction Scope
SKILL.md primarily instructs the operator/agent to read target prompt files and run scripts/audit.sh to produce a 16-point audit and suggested rewrites. This is within scope. Two caveats: (1) SKILL.md repeatedly says to 'identify model history violations' but doesn't define where or how to obtain model violation history (could imply reading logs or conversation history) — that is ambiguous and may require operator guidance to avoid overbroad data access; (2) SKILL.md explicitly states the skill is advisory and should not modify prompts automatically, which reduces risk if followed.
Install Mechanism
No install spec — instruction-only plus two small code files. Nothing downloaded from the network or installed on the host during skill activation.
Credentials
The skill requests no environment variables, credentials, or config paths. The actions described (reading prompt files and running a local audit script) are proportionate to the stated purpose.
Persistence & Privilege
always is false and there are no indications the skill modifies other skills or system-wide settings. The skill can be invoked autonomously by agents (platform default) but it does not request elevated or persistent privileges.
Assessment
This skill appears to do what it says: static guidance and a small local audit script for hardening prompts. Before installing or running it: (1) Review the audit script locally — it contains several shell-logic bugs (quoting/expansion issues) so its results may be unreliable; run it in a safe sandbox or inspect and fix it first. (2) Note SKILL.md asks you to 'identify model history violations' but doesn't specify which logs or data to use — don't let the agent start reading unrelated logs or private data without explicit operator consent. (3) The skill is advisory and says it will not auto-modify prompts; insist on manual operator approval before applying any changes. (4) If you plan to use automated enforcement, pair prompt hardening with code-level/tool hooks (the skill itself recommends that) rather than relying solely on prompt edits. If you want extra assurance, ask the author for clarity on how 'model history' should be obtained and for a corrected audit.sh implementation.

Like a lobster shell, security has layers — review code before you run it.

latestvk97bqae5cwwak1txjzhbb1ct1x84cbt0
83downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

Prompt Hardening

硬化 agent prompt 使其可靠遵循指令的系统化方法。

核心原则:Prompt 不是策略文档,而是错误纠正系统。最可靠的约束不是更好的措辞,而是结构化不可能性。

When to Use

  • Agent 反复违反同一条规则
  • 部署新的 agent system prompt 前需要质量审计
  • Agent "创造性地"绕过工具约束或长对话后行为漂移
  • 如果不确定是否需要硬化,先用 scripts/audit.sh 审计一遍再决定

When NOT to Use

  • 不适用于代码生成、代码审查等执行型任务(用 code-review-enhanced、tdd-workflow)
  • 不适用于 skill 质量改进流程(用 improvement-orchestrator)
  • 不要把所有 prompt 问题都归为"硬化不够"——有些需要代码级强制(P13)而不是更多文字
<example> 正确用法:对 SOUL.md 中被反复违反的 dispatch 规则进行硬化 输入: "MUST 通过 dispatch.sh 派发"(一句话,模型 3 次无视) 应用: P1+P2+P3+P5+P7+P16 六模式叠加 结果: 合规率从 ~50% → ~90%(配合 EXEC GUARD plugin P13 达到 ~99%) </example> <anti-example> 错误用法:用 prompt-hardening 替代代码级强制 只改 SOUL.md 不加 plugin hook → prompt 层面永远不是 100% 可靠 关键约束 MUST 同时有代码级强制(P13)作为备份 </anti-example>

Quick Reference

场景推荐模式强度
模型反复违反同一条规则P1 三重强化 + P13 代码级最强
模型绕过工具约束P2 工具强制 + P3 穷举枚举
模型"合理化"违规P5 反推理阻断
长对话偏离规则P9 漂移防护 + P11 Echo-Check
新规则首次部署P4 条件触发 + P7 示例对标准

16 个硬化模式

详细说明、示例和来源见 references/patterns.md

#模式一句话说明来源
P1三重强化MUST/NEVER + good/bad example + I REPEATClaude Code, ChatGPT
P2工具强制Use X (NOT Y) + 失败原因Claude Code, Warp
P3穷举否定✅/❌ 列出所有允许/禁止行为Codex CLI
P4条件触发当 X → MUST Y / NEVER ZGemini CLI
P5反推理阻断预判模型合理化借口并阻断Claude.ai
P6优先级层级显式声明规则冲突时谁赢Gemini, Jules
P7行为锚定good/bad example + reasoning 标签Claude Code
P8范围限制做要求的事,不多做Claude Code, Warp
P9漂移防护长对话中注入提醒Claude.ai
P10信任边界区分可覆盖/不可覆盖的指令源ChatGPT
P11Echo-Check执行前复述约束Reddit (40-60% ↑)
P12约束优先约束 token > 任务描述 tokensinc-LLM (42.7%)
P13结构化不可能代码级强制 > prompt 强制Anthropic
P14状态机门禁布尔前置条件锁定阶段Factory DROID
P15自我归因修正第一人称"我刚才做错了"纠正CrewAI
P16首尾重复关键约束放 prompt 开头+结尾Lost in the Middle

可靠性等级

防护层级可靠性组合使用可靠性
软约束~40%P1 + P5~90%
MUST/NEVER~70%P1 + P5 + P13~99%
MUST + 示例~80%P1 + P5 + P13 + retry~100%

CLI

# 审计现有 prompt(16 项检查)
~/.claude/skills/prompt-hardening/scripts/audit.sh ~/path/to/SOUL.md

应用清单

#检查项
1P0 规则用了三重强化(MUST + 反面示例 + 重复)?
2工具约束用了 Use X (NOT Y) + 失败原因?
3禁止行为穷举列出?
4关键触发用了 当 X → MUST Y 格式?
5有反推理阻断?
6优先级层级显式声明?
7有好/坏示例对?
8范围边界明确?
9长对话有漂移防护?
10信任边界明确?
11关键操作前有 echo-check?
12约束 token > 任务描述 token?
13最关键约束有代码级强制(L5)备份?
14多步操作有状态机门禁?
15违规后有自我归因修正模板?
16关键约束在 prompt 首尾都出现?

Usage

1. 读取目标 prompt
2. 识别模型历史违反过的规则(最高优先级硬化)
3. 运行 scripts/audit.sh 获取 16 项检查结果
4. 历史违反 → P1 三重强化 + P13 代码级
5. 重要规则 → P2 + P4 + P5
6. 一般规则 → P3 + P7
7. 验证约束 token 占比 > 40%

Output Artifacts

请求交付物
硬化 prompt重写后的 prompt 文件
审计 prompt16 项检查清单 + 改进建议
分析违规违规模式分类 + 硬化方案

References

  • references/patterns.md — 16 个模式的详细说明和代码示例
  • references/sources.md — 13 个研究来源

Operator Notes

  • Advisory/planning skill. Does not modify target prompts automatically.
  • When execution is needed, call out that the operator must apply changes manually or use improvement-executor.

Comments

Loading comments...