Ai Friendly Architecture Design Skill

Prompts

Use when system needs to handle AI uncertainty, Agent types must be selected, APIs will be consumed by AI, or architecture must support probabilistic outputs and dynamic planning

Install

openclaw skills install ai-friendly-architecture-design

AI Friendly Architecture Design

Overview

AI Friendly architecture enables traditional systems to handle AI's inherent uncertainty through three paradigm shifts: deterministic→probabilistic, structured→semantic, and static→dynamic. This skill guides agents to apply these principles correctly and avoid common anti-patterns.

Core principle: Use appropriate architecture for the problem—don't over-engineer with AI when traditional solutions suffice.

When to Use

Use when:

Designing systems that incorporate LLM/AI capabilities
Evaluating whether to use AI Friendly architecture vs traditional architecture
Designing Agent-based systems (ReAct, Plan, Multi-Agent)
Creating APIs that will be consumed by AI Agents
Building context engineering pipelines for AI applications

Do NOT use when:

Building simple CRUD applications with no AI requirements
Creating AI Workflow applications that only call pre-built Agents as APIs
The system only needs deterministic, rule-based logic

The Three Paradigm Shifts

1. Deterministic → Probabilistic

Traditional: Output follows y=f(x) mapping—binary success/failure.

AI Friendly: Output emerges from model + prompt + context + environment. Design goal: converge probabilistic output to an acceptable "safe interval" through RAG, prompt engineering, and evaluation mechanisms.

Design implication: Don't expect exact schema compliance from AI outputs. Build validation and fallback mechanisms.

2. Structured → Semantic

Traditional: Input must match predefined Schema exactly (JSON field types). System boundary is a rigid wall.

AI Friendly: System understands natural language and unstructured data. Responds based on intent, not format. System boundary becomes an elastic membrane.

Design implication: Design interfaces that accept flexible inputs and translate intent to actions.

3. Static → Dynamic

Traditional: Execution paths defined by hardcoded if-else logic or rules. Behavior is enumerable and verifiable.

AI Friendly: System makes decisions based on models, can reason about current state, decompose tasks, and respond to unknown changes without human intervention.

Design implication: Shift from "rules" to "planning"—grant systems autonomy for intelligent task orchestration.

Architecture Layers

┌─────────────────────────────────────────────────────────────┐
│                    Quality & Stability Layer                 │
│         (AI Observability, Evaluation, Security)            │
├─────────────────────────────────────────────────────────────┤
│                      Application Layer                       │
│    ┌──────────┐  ┌──────────┐  ┌──────────────────────┐    │
│    │  Agent   │  │  Intent  │  │      Session         │    │
│    │  Layer   │  │  Layer   │  │      Layer           │    │
│    └──────────┘  └──────────┘  └──────────────────────┘    │
├─────────────────────────────────────────────────────────────┤
│                    Capability Layer                          │
│         (MCP, RAG, Function Calling)                        │
├─────────────────────────────────────────────────────────────┤
│                   Foundation Layer                           │
│    ┌──────────┐  ┌──────────┐  ┌──────────────────────┐    │
│    │  Model   │  │ Knowledge│  │   Tool Management    │    │
│    │Management│  │Management│  │                      │    │
│    └──────────┘  └──────────┘  └──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Foundation Layer

Model Management: Unified API (OpenAI protocol) for multiple LLM providers
Knowledge Management: Vector storage and retrieval for different knowledge sources
Tool Management: MCP protocol for tool integration, Computer Use skills

Model Management Details

Provider Selection:

Factor	Consideration
Latency requirements	Regional providers vs global (OpenAI, Anthropic)
Cost	Per-token pricing, batch discounts, small model for simple tasks
Data privacy	On-premise (Ollama, vLLM) vs cloud API
Capability	Task-specific: code (Codex), vision (GPT-4V), reasoning (Claude)

Failover Strategy:

Primary model → fallback model → rules-based fallback
Circuit breaker pattern: after N failures, skip model for cooldown period
Graceful degradation: reduce output quality rather than fail completely

Cost Optimization:

Model tiering: small model for classification, large model for reasoning
Caching: cache deterministic or near-duplicate queries
Batching: group non-urgent requests for batch API pricing
Prompt optimization: shorter prompts = fewer tokens = lower cost

Agent Layer

Reference: The ReAct pattern is from the paper ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022).

Three Agent types for different scenarios (this is one common taxonomy—other frameworks may use different classifications):

Agent Type	Capability	Use Case
BaseAgent	Fixed workflow, no dynamic planning	Simple chatbots, AI Workflows
ReActAgent	Thought→Action→Observation loop	Rational tasks with tool use
PlanAgent	Global planning + ReAct execution	Complex tasks requiring strategy

Note: "PlanAgent" is a common architectural pattern but not a standardized term—different frameworks may name it differently.

ReAct + Plan Combination:

Plan produces global strategy (use templates for quality)
ReAct executes domain-specific reasoning
Together they handle both strategic and tactical problems

Intent Layer

Required only for multi-intent scenarios. Handles:

Parallel intents: Multiple independent intents in one query
Sequential intents: Intent B depends on Intent A result
Logical intents: Intents with logical relationships

Also performs query rewriting and expansion for intent optimization.

Session Layer

Manages conversation state and user context. Feeds data into Context Engineering.

Responsibilities:

Session lifecycle: creation, timeout, cleanup
State persistence: save/load conversation state across sessions
User context binding: associate session with user profile, preferences

What it does NOT do:

Memory management strategies → see Context Engineering
RAG retrieval → see Capability Layer
Model selection → see Foundation Layer

Session Layer provides the data; Context Engineering decides how to use it.

Multi-Agent Patterns

Three coordination patterns:

Pattern	Decision Point	Use Case
Centralized	Single coordinator agent	Clear task decomposition
Decentralized	Peer-to-peer negotiation	Collaborative problem-solving
Hybrid	Mixed coordination	Complex domains with sub-domains

MOE (Mixture of Experts) Pattern:

Each domain has specialized Agent (product, order, inventory, etc.)
Central Agent performs intent recognition and task distribution
Domain Agents execute with ReAct + Plan capabilities

AI Friendly API Design

Tool Atomicity

Split interfaces into atomic tools matching Agent reasoning patterns:

❌ getProductWithInventoryAndPricing(id)
✅ getProduct(id)
✅ getInventory(productId)
✅ getPricing(productId)

Parameter Design

Human-readable names, flat KV structure, core parameters only:

// ❌ Bad: Nested, complex
{"product": {"identifiers": {"sku_id": "123"}, "filters": {"status": "active"}}}

// ✅ Good: Flat, clear
{"sku_id": "123", "status": "active"}

Error Handling

Expected errors: Short descriptions for Agent reasoning
Unexpected errors: Stack traces for error diagnosis

Context Engineering

Beyond Prompt Engineering

Context Engineering selects, organizes, and compresses information optimally within context windows. It is more impactful than model selection for production systems.

Core Techniques

1. Historical Case Library (Illustrative: ~8% accuracy improvement)

*Based on production AI Review system results. See Real-World Impact section.

Store past successful cases with their decision reasoning. Retrieve similar cases via vector search to guide current decisions.

When to use: Tasks with recurring patterns (code review, customer support, troubleshooting)
Implementation: Embed past cases → vector store → similarity search → inject top-K into context
Key: Include both the case AND the reasoning, not just the result

2. Hybrid Decision-Making (Illustrative: ~10%+ accuracy improvement)

*Based on production CogentAI system results. See Real-World Impact section.

Multiple models vote with confidence scores. Use when single-model accuracy is insufficient.

When to use: High-stakes decisions, compliance checks, medical/financial analysis
Implementation: Run 2-3 models in parallel → collect outputs → weighted voting based on domain confidence
Key: Assign domain-specific confidence weights, not uniform voting

3. Memory Management

Type	Scope	Implementation	Use Case
Long-term memory	Cross-session	Persistent store (DB/vector)	User preferences, history
Short-term memory	Current session	In-context window	Current task context
Working memory	Current step	Scratchpad pattern	Intermediate reasoning

Summarization: Periodically compress long-term memory to avoid context overflow
Relevance scoring: Only inject relevant memories, not everything

4. Advanced RAG

Technique	When to Use	Trade-off
Standard RAG	Simple Q&A over documents	Low complexity, moderate accuracy
GraphRAG	Entity relationships, knowledge graphs	Higher complexity, better associative retrieval
Dynamic context pruning	Long conversations, large knowledge bases	Reduces noise, may lose relevant context
Hybrid retrieval (dense + sparse)	Mixed structured/unstructured data	Best recall, more infrastructure

Decision Guide

Is the task pattern-recurring?
├─ Yes → Build historical case library
└─ No → Is single-model accuracy sufficient?
    ├─ Yes → Standard RAG + memory management
    └─ No → Hybrid decision-making
        └─ Complex entity relationships?
            ├─ Yes → GraphRAG
            └─ No → Standard RAG + dynamic pruning

Quick Reference: Decision Framework

Decision Questions

#	Question	If Yes	If No
1	Is the task deterministic?	Traditional architecture (MVC/DDD)	→ Q2
2	Does it need language understanding?	→ Q3	Rules or traditional ML
3	Are there hard performance constraints (<100ms)?	Hybrid: AI offline + rules online, caching	→ Q4
4	Are there strict cost constraints?	Hybrid: AI + rules, small models, caching	→ Q5
5	Does it need dynamic tool selection?	→ Q6	Single LLM call or AI Workflow
6	Does it need multi-step reasoning?	PlanAgent + ReActAgent	ReActAgent
7	Multiple domains?	Multi-Agent (MOE pattern)	Single Agent

Decision Criteria

Q1: Is the task deterministic?

Yes: Input→output mapping is fixed, no natural language understanding needed
No: Task requires understanding intent, context, or unstructured data
Examples:
- Deterministic: Form validation, data transformation, CRUD operations
- Non-deterministic: Customer support, content generation, recommendation

Q2: Does it need language understanding?

Yes: Task involves natural language input/output, requires semantic understanding
No: Task can be solved with rules, traditional ML, or simple pattern matching
Examples:
- Needs understanding: Chatbots, content analysis, query interpretation
- Rules/ML: Fraud detection (features), image classification (CNN)

Q3: Are there hard performance constraints?

Yes: Response time <100ms, real-time requirements, SLA commitments
No: Batch processing, async operations, acceptable delays
Hybrid approach: AI for offline training/analysis, rules for online decisions

Q4: Are there strict cost constraints?

Yes: Limited budget, high volume, cost-sensitive application
No: Budget allows for AI infrastructure, low volume
Hybrid approach: AI for high-value decisions, rules for routine tasks

Q5: Does it need dynamic tool selection?

Yes: System must choose tools/APIs based on context at runtime
No: Fixed workflow, predetermined tool sequence
Examples:
- Dynamic: Research agent, adaptive troubleshooting
- Fixed: Data pipeline, report generation

Q6: Does it need multi-step reasoning?

Yes: Task requires planning, decomposition, backtracking
No: Single-step execution, direct action
Examples:
- Multi-step: Complex research, multi-domain problem solving
- Single-step: Simple Q&A, data lookup

Q7: Multiple domains?

Yes: System handles multiple specialized areas with different knowledge/tools
No: Single domain, focused expertise
Examples:
- Multi-domain: Enterprise support (product, billing, technical)
- Single-domain: Specialized medical diagnosis

Constraint Strategy Table:

Constraint	Strategy	Example	Priority
Latency <100ms	AI offline training + rules online, caching	Real-time recommendation	Hard
High volume (10M+/day)	Caching + small models + rules fallback	API gateway	Soft
Limited budget	Hybrid AI+rules, model tiering	Internal tools	Soft
Data privacy	On-premise models, data anonymization	Medical/financial	Hard
Performance + Cost	Small models + caching, reduce AI scope to high-ROI tasks	Budget real-time system	Hard > Soft
Performance + Privacy	On-premise small models + caching, accept lower accuracy	Hospital system	Hard > Hard
Cost + Privacy	On-premise small models, rules for low-value decisions	Internal medical tool	Soft < Hard

Rule: Hard constraints (latency, privacy) beat soft constraints (cost, accuracy). When constraints conflict, optimize for the hard constraint first.

Common Mistakes

Mistake	Why Wrong	Fix
Using Multi-Agent for simple FAQ	Over-engineering, adds latency/cost	Use single Agent with knowledge base
Complex nested API for Agent tools	Agents can't parse deep structures	Atomic tools with flat parameters
Skipping Intent Layer for multi-intent queries	Agent can't distinguish user goals	Add Intent Layer with query rewriting
Using ReAct for deterministic tasks	Unnecessary reasoning overhead	Use BaseAgent or workflow
Ignoring Context Engineering	Poor model performance	Build case libraries, hybrid decisions
Ignoring performance constraints	AI inference latency breaks SLA	Use hybrid architecture, see Constraint Strategy Table
Ignoring cost constraints	Unsustainable AI spend at scale	Model tiering, caching, rules fallback
"To use AI" as architecture goal	No business value	Define specific problems AI solves

Rationalization Table

Excuse	Reality
"Multi-Agent is always better"	Coordination overhead isn't justified for simple tasks
"ReAct can handle everything"	Deterministic tasks don't need dynamic reasoning
"AI Friendly API is too much work"	Atomic tools are easier to maintain and test
"Context Engineering is optional"	Memory and context are more important than model choice
"We need AI for everything"	Traditional architecture handles deterministic logic better
"Paradigm shifts are just theory"	They explain WHY the patterns work—skip them at your peril
"Context Engineering is just RAG"	Includes memory, hybrid decisions, case libraries beyond RAG
"Intent Layer is optional"	Required for multi-intent scenarios—Agent alone can't distinguish
"This only applies to LLM systems"	Principles apply to any AI system with uncertainty

Red Flags - STOP and Reconsider

Adding AI without clear problem definition
Using Agents where simple LLM calls suffice
Complex nested APIs for Agent consumption
Skipping evaluation and observability
Ignoring the "when NOT to use" guidelines

Real-World Impact

From production systems:

AI Review: Illustrative results: 95.7% accuracy, 99.1% recall, 80%+ efficiency improvement
AI Q&A (CogentAI): Illustrative results: 98%+ problem-solving accuracy, 80%+ efficiency improvement
Key success factors: Proper architecture selection, Context Engineering, continuous evaluation

Note: These metrics are illustrative examples from the source article, not independently verified measurements. Results will vary based on implementation quality, data, and domain.

References

Article Summary (English)
ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)
Skill Authoring Guide