Prompts

v1.0.0

Deep prompt engineering workflow—task spec, constraints, examples, evaluation sets, iteration protocol, regression testing, and safety alignment. Use when im...

0· 150· 1 versions· 0 current· 0 all-time· Updated 1mo ago· MIT-0

Prompt Engineering (Deep Workflow)

Prompts behave like natural-language programs: they need specs, tests, and version control—especially in production.

When to Offer This Workflow

Trigger conditions:

  • Prompt or system message change; quality regressions
  • Structured outputs (JSON), tool use, or RAG grounding requirements
  • Safety or policy alignment needs

Initial offer:

Use six stages: (1) define task & success, (2) constraints & format, (3) few-shot & style, (4) build eval set, (5) iterate with discipline, (6) ship, monitor, regress). Confirm model family and latency budget.


Stage 1: Define Task & Success

Goal: Clear user-visible outcome and failure modes (hallucination, omission, tone).

Exit condition: Success rubric in plain language; out-of-scope cases listed.


Stage 2: Constraints & Format

Goal: Must/must-not rules; output schema (JSON Schema, bullet structure); length limits.

Practices

  • Separate system (policy, role) from user (task instance)
  • Ask model to cite sources when grounding matters

Stage 3: Few-Shot & Style

Goal: Use examples only when they reduce ambiguity—avoid huge prompt bloat.

Practices

  • Diverse examples; avoid overlong negative examples that confuse

Stage 4: Build Eval Set

Goal: Frozen inputs with expected properties (not always exact text match).

Practices

  • Adversarial and multilingual slices if relevant
  • Regression suite in CI for critical prompts

Stage 5: Iterate With Discipline

Goal: Change one major variable at a time when debugging quality.

Practices

  • Compare with same temperature settings when A/B testing wording
  • Log prompt version id with outputs in production

Stage 6: Ship, Monitor, Regress

Goal: Canary prompt changes; watch implicit signals (thumbs, edits, task completion).


Final Review Checklist

  • Task and rubric defined
  • Constraints and output format explicit
  • Eval set versioned; regression path exists
  • Iteration log disciplined; prompt versions tracked
  • Production monitoring and rollback plan

Tips for Effective Guidance

  • Clarity beats cleverness—short explicit instructions often win.
  • Chain-of-thought: use when reasoning helps; hide chain from end users if needed.
  • Align with llm-evaluation skill for larger harness design.

Handling Deviations

  • Chat vs batch: batch can use stricter structure and lower temperature.
  • Multimodal: specify how image details may be used or ignored.

Version tags

latestvk979fjsdrazvdts99bxp3qgxd983pp32