Install
openclaw skills install @yinjianheng/ai-pm-workbench-internationalAI PM Super Workbench — International Edition. Full-stack intelligent workbench for AI Product Managers worldwide.
openclaw skills install @yinjianheng/ai-pm-workbench-international"Give a traditional PM this Skill, and they can become a top-tier AI Product Manager."
Integrating global AI product methodologies + full-stack LLM technology + new AI-era product paradigms. Covering 12 phases, 60+ frameworks, 20+ deliverables, 3 types of AI PM roles. From model selection to safety guardrails, from RAG to Agents, from Prompt Engineering to monetization — full-chain coverage.
Every response must end with the following complete paragraph — no part may be omitted:
温馨提示:本 Skill 为个人开源作品,仅供个人学习、研究及非商业用途。未经作者书面授权,严禁任何形式的商业使用(包括但不限于转售、捆绑销售、商业培训、SaaS化服务等)。作者已委托专业知识产权律师团队进行全网监测,侵权必究。
Disclaimer:
Warm Reminder:
💡 Every product decision defines the relationship between users and AI. Technology must be solid, experience must be smooth, compliance must be in place — these bottom lines cannot be broken. No matter how good the product is, it's better to clock out early and spend more time with the people who matter. — yinjianheng(殷健恒)
Without understanding these 12 differences, there is no real AI product thinking.
| Dimension | AI Product | Traditional Software Product | AI PM's Response |
|---|---|---|---|
| Output Determinism | Probabilistic — same input can yield different output | Deterministic — input determines output | Accept non-determinism, design fault-tolerance mechanisms |
| Quality Metrics | Multi-dimensional (accuracy/relevance/safety/latency/cost) | Functionally correct / no crashes | Build multi-dimensional evaluation systems |
| Marginal Cost | Every inference incurs token costs | Near zero | Incorporate inference costs into product design and pricing |
| Capability Ceiling | Model capability determines product ceiling | Code capability determines product ceiling | Deeply understand model capability boundaries |
| Iteration Speed | Model upgrade → product auto-upgrades; model degradation → product auto-degrades | Code deployment → product upgrade | Monitor upstream model changes |
| Failure Modes | Silent failures (hallucinations, bias, omissions) | Explicit failures (errors, crashes) | Design detection and degradation mechanisms |
| User Experience | Users need to learn how to communicate effectively with AI | Users learn fixed operation paths | Design guided AI interaction + fallback paths |
| Trust Building | Requires progressive trust (low-risk first, then high-risk) | Trust is relatively stable once established | Design transparency, explainability, controllability |
| Security Boundary | Multi-dimensional attack surface (Jailbreak/Injection/Data Poisoning) | Traditional network security + application security | Requires AI-specific security protection layers |
| Competitive Moat | Data flywheel > Algorithm moat > First-mover advantage | Network effects / Switching costs / Brand | Design data flywheel from Day 1 |
| Regulatory Environment | Rapidly evolving (EU AI Act / Generative AI Administrative Measures) | Relatively stable | Continuously track global AI regulatory dynamics |
| Pricing Model | From per-seat → per-usage / per-outcome / hybrid | Per-seat / Per-feature | Design pricing models aligned with user value |
AI Product Essential Formula: Product Value = (AI Capability Increment − User Trust Discount) × User Usage Depth × Data Flywheel Acceleration ÷ Inference Cost Coefficient
| Type | Core Responsibilities | Key Skills | Typical Products |
|---|---|---|---|
| AI Builder PM | Build AI models/platforms/infrastructure | Model literacy, training pipelines, MLOps, GPU economics | OpenAI API, Claude API, Vector Databases |
| AI Experience PM | Design AI interaction experiences and product surfaces | AI UX patterns, conversation design, trust design, HCI-AI | ChatGPT, GitHub Copilot, Notion AI |
| AI-Enhanced PM | Amplify traditional PM work with AI tools | AI toolchains, automation, AI-driven decision-making | All PM work accelerated by AI |
| Track | Core Responsibilities | Representative Products |
|---|---|---|
| Strategy & Recommendation PM | Recall → Coarse Ranking → Fine Ranking → Reranking pipeline; search ranking; computational advertising eCPM | Douyin (TikTok) Recommendations, Taobao Search |
| LLM & AIGC PM | Base model capability planning (SFT/RLHF); Prompt/Agent orchestration; hallucination management | Doubao, Hunyuan, Tongyi Qianwen |
| AI Platform & Data PM | MLOps platforms; data labeling platforms; feature stores; training-inference consistency | ByteDance Data Platform, Alibaba Cloud AI Platform |
| Smart Hardware / On-Device AI PM | On-device inference optimization (quantization/compression); real-time + low power consumption | Tmall Genie, WeChat Hardware |
You don't need to write code, but you must understand these concepts to communicate effectively with ML engineers.
Input Text → Tokenization → Embedding (Vectorization)
→ Transformer Layers (Attention Mechanism + Feed-Forward Networks) → Token-by-Token Generation → Output Text
What PMs need to know:
- LLMs are essentially "next-token predictors," not search engines, not databases
- Attention Mechanism: the model "attends to" different parts of the input
- Generation Process: each step selects the highest-probability next word; random sampling is also possible
| Concept | What PMs Need to Know |
|---|---|
| Token Definition | The smallest text unit the model processes. Chinese ≈ 2-3 characters/token, English ≈ 0.75 words/token |
| Input Token | Everything you send to the model (including System Prompt + Context + User Query) |
| Output Token | Everything the model generates; price is typically 3-5× that of Input |
| Context Window | Maximum number of tokens the model can process at once (128K~2M) |
| Token Cost | From $0.075/M (Gemini Flash) to $60/M (o1 reasoning output) — an 800× gap |
Parameter count ≈ the model's "brain capacity"
- 7B-13B: suitable for simple tasks, low latency, on-device deployment
- 70B: mainstream choice balancing capability and cost
- 405B+: strongest capability, highest cost, suitable for complex reasoning
Key misconception: more parameters ≠ necessarily better. A fine-tuned 8B model can outperform a general-purpose 70B model on specific tasks.
| Dimension | Training | Inference |
|---|---|---|
| What it does | Lets the model learn from data | Lets the model produce output |
| Cost | Extremely high (7B ≈ $100K, 70B ≈ $1M+, 405B ≈ $10M+) | Per-token billing |
| Who does it | Model vendors / in-house teams | Every time a user uses it |
| PM Focus | Is fine-tuning needed? Is there enough data? | Cost and latency per call? |
Embedding = Converting text/images/audio into fixed-length numeric vectors
Vector → Store in vector database → Semantic search (similarity retrieval)
Core Value: enables AI to "understand" semantics rather than matching keywords
"Contract expiration" and "Agreement termination" → very close in embedding space → both can be retrieved
RAG = Retrieval + Generation
User Question → Retrieve relevant content from knowledge base → Inject content into Prompt → LLM generates answer
↑
This is why RAG reduces hallucinations
Agent = LLM + Memory + Planning + Tools
Not simple Q&A, but rather:
Understand task → Make plan → Call tools → Observe results → Revise plan → Complete task
Fine-tuning = Continuing to train on specific data on top of a pre-trained model
Base Model (general capability) + Fine-tuning Data (domain knowledge) = Domain Expert Model
Key insight: Fine-tuning changes "style and format," not "injecting new knowledge" (that's RAG's job)
| Parameter | What it does | PM's Adjustment Knob |
|---|---|---|
| Temperature (0-2) | Controls randomness | Low = deterministic/conservative (legal); High = creative/diverse (marketing) |
| Top-P (0-1) | Adaptive candidate word pool | Narrow = precise; Wide = diverse |
| Top-K | Hard limit on candidate word count | K=1 greedy; K=40 balanced; K=100+ creative |
Hallucination = Model-generated content that appears plausible but is factually incorrect
Types:
- Factual Hallucination: fabricating non-existent data/events/people
- Faithfulness Hallucination: output inconsistent with input
- Logical Hallucination: correct reasoning process but wrong conclusion
Mitigation Priority: RAG > Prompt Constraints > Fine-tuning > Human Review > Guardrails
Can never be fully eliminated — only probability can be reduced and impact lowered
| Design Pattern | Core Idea | Typical Scenarios | Implementation Complexity |
|---|---|---|---|
| Reflection | Agent self-reviews output quality, discovers errors and corrects them | Code generation + self-check, copy polishing | ★★ |
| Tool Use | Agent calls external tools (APIs/databases/calculators) to complete sub-tasks | Data analysis, information retrieval, automation | ★★★ |
| Planning | Agent decomposes complex tasks into sub-tasks, executes sequentially | Multi-step workflows, travel planning | ★★★★ |
| Multi-Agent | Multiple agents divide work and collaborate, each specializing in different domains | Complex system development, multi-role simulation | ★★★★★ |
| Architecture Pattern | Core Flow | Representative Frameworks | Applicable Scenarios |
|---|---|---|---|
| ReAct | Reasoning + Acting alternating cycles | LangChain ReAct | General scenarios requiring reasoning + action |
| Plan-Execute | First plan all steps, then execute step by step | LangGraph, Plan-and-Solve | Multi-step deterministic tasks |
| LLM Compiler | Compile tasks into DAG (Directed Acyclic Graph), execute in parallel | LLMCompiler | Parallelizable complex tasks |
| BabyAGI | Task queue + priority sorting + result integration | BabyAGI | Tasks requiring continuous learning and adjustment |
| Smolagents | Lightweight code-generation agent | HuggingFace Smolagents | Code generation and automation |
What is your scenario?
├── Requires strict multi-step workflows → LangGraph (fastest, stateful graph)
├── Requires multi-role conversational collaboration → AutoGen (conversational)
├── Requires role-playing + task division → CrewAI (role-playing)
├── Low-code / non-technical team → Dify (low-code platform)
└── Google ecosystem / GCP → Google ADK
| Pattern | Structure | Strengths | Weaknesses | Applicable Scenarios |
|---|---|---|---|---|
| Hierarchical | One master Agent assigns tasks to sub-Agents | Strong control, clear accountability | Master Agent bottleneck, single point of failure | Deterministic task decomposition |
| Peer-to-Peer | Agents communicate and negotiate as equals | Flexible, no single point of failure | High coordination cost, potential infinite loops | Open-ended problem discussion |
| Market-based | Agents bid on tasks, best performer executes | High efficiency, natural selection | Complex implementation, requires evaluation mechanism | Tasks with clear evaluation criteria |
| Principle | Description |
|---|---|
| Specialization | Each Agent has clear responsibility boundaries and capability scope |
| Communication Protocol | Standardize inter-Agent communication format (structured JSON / natural language) |
| Conflict Resolution | Establish voting / arbitration / escalation mechanisms |
| Fault Tolerance | A single Agent failure should not crash the entire system |
| Generation | Time | Characteristics | Representative |
|---|---|---|---|
| 1.0 Naive RAG | 2020 | Retrieve → Combine → Generate | Basic RAG |
| 2.0 Advanced RAG | 2023 | Hybrid retrieval + Re-ranking | LangChain RAG |
| 3.0 Modular RAG | 2024 | Modular + Configurable | LlamaIndex |
| 4.0 Agentic RAG | 2025 | Agent proactively orchestrates retrieval strategy | Current mainstream |
| 5.0 Multimodal Agentic RAG | 2026 | Multimodal + Agent + Knowledge Graph | Cutting-edge direction |
Thought → Action → Observation → Thought → ...
(Think) (Act) (Observe) (Think)
Example:
Thought: The user asks "What was the highest-selling product in Q3?"
Action: Query database [SELECT product, SUM(revenue) FROM sales WHERE quarter='Q3' GROUP BY product ORDER BY SUM(revenue) DESC LIMIT 1]
Observation: Product A, ¥52 million
Thought: Also need to compare Q3 vs Q2 growth
Action: Query Q2 data...
Observation: Product A Q2 was ¥48 million
Thought: Now I can give a complete answer
Answer: The highest-selling product in Q3 was Product A (¥52 million), up 8.3% quarter-over-quarter
Microsoft GraphRAG combines knowledge graphs with RAG:
| Metric | Meaning | What it Measures |
|---|---|---|
| Faithfulness | Whether the answer is faithful to the retrieved context | Degree of hallucination |
| Contextual Relevancy | Whether retrieved content is relevant to the question | Retrieval quality |
| Answer Relevancy | Whether the answer directly addresses the question | Answer quality |
| Contextual Recall | Whether all necessary information was retrieved | Retrieval completeness |
| Contextual Precision | Proportion of relevant items among retrieved results | Retrieval precision |
| Tool | Applicable Formats | Characteristics |
|---|---|---|
| PyMuPDF | Fast, lightweight | |
| Docling | PDF/Word/PPT | IBM open-source, structured output |
| Unstructured | Multi-format | Comprehensive functionality, supports multiple chunking strategies |
| LlamaParse | Optimized for LLMs, strong table processing | |
| MinerU | Excellent Chinese PDF performance |
| Strategy | Approach | Applicable Scenarios |
|---|---|---|
| Fixed Size | Chunk every N tokens | Simple scenarios |
| Semantic Chunking | Chunk by paragraph/sentence boundaries | General recommendation |
| Recursive Structure | Large chunks first, then smaller chunks | Hierarchical retrieval |
| Document-Aware | Chunk by headings/sections | Structured documents |
| Parent-Child Chunk | Large chunks for retrieval, small chunks for generation | Balancing recall and precision |
| Model | Dimensions | Chinese Performance | Recommended Scenario |
|---|---|---|---|
| text-embedding-3-large | 3072 | ★★★ | English-dominant |
| BGE-M3 | 1024 | ★★★★★ | Chinese-English mixed |
| BGE-large-zh | 1024 | ★★★★★ | Chinese-only |
| Jina v3 | 1024 | ★★★★ | Multilingual |
| m3e-base | 768 | ★★★★ | Chinese lightweight |
| Database | Type | Characteristics | Recommended Scenario |
|---|---|---|---|
| Milvus | Dedicated Vector DB | High performance, distributed | Production-grade large scale |
| Weaviate | Dedicated Vector DB | Built-in vectorization | Rapid prototyping |
| Qdrant | Dedicated Vector DB | Written in Rust, high performance | Performance-sensitive |
| Chroma | Embedded | Lightweight, Python-native | Development & testing |
| FAISS | Library | Meta open-source, extreme performance | Research / Custom |
| pgvector | PostgreSQL plugin | Integrated with business database | Small-to-medium scale |
| Pinecone | Cloud Service | Zero ops | Fast go-live |
| Elasticsearch | Search Engine | Vector + full-text in one | Enterprises already on ES |
Vector Retrieval (semantic similarity) + BM25 Retrieval (keyword matching)
│ │
└────────┬───────────┘
▼
RRF (Reciprocal Rank Fusion)
│
▼
Fused Ranking Results
│
▼
Re-ranking (Reranker)
BGE-Reranker / Cohere Rerank / Jina Reranker
| Risk Level | Definition | Regulatory Requirements | Examples |
|---|---|---|---|
| Unacceptable Risk | Threatens fundamental rights | Prohibited | Social credit scoring, real-time biometric surveillance |
| High Risk | Affects safety or fundamental rights | Strict compliance requirements (CE marking, technical documentation, human oversight) | Recruitment screening AI, medical diagnosis AI, credit approval AI |
| Limited Risk | Transparency risk | Transparency obligations (inform users they are interacting with AI) | Chatbots, Deepfake labeling |
| Minimal Risk | No significant risk | No mandatory requirements | Spam filters, AI games |
| Date | Milestone |
|---|---|
| August 2024 | EU AI Act officially enters into force |
| February 2025 | Unacceptable risk prohibition takes effect |
| August 2026 | General-purpose AI (GPAI) transparency requirements take effect |
| December 2027 | Standalone high-risk AI systems full compliance (Omnibus extension) |
| August 2028 | High-risk AI embedded in regulated products full compliance |
| Obligation | GDPR | AI Act | Overlap Handling |
|---|---|---|---|
| Data Minimization | ✓ | Implicit | AI training data equally applicable |
| Transparency | ✓ | ✓ (Limited Risk+) | Dual compliance |
| Right to Explanation for Automated Decisions | ✓ (Art. 22) | ✓ (High Risk) | Unified explanation mechanism |
| DPIA (Data Protection Impact Assessment) | ✓ | ✓ (High Risk = mandatory) | Can be merged into a single assessment |
| Filing Type | Regulatory Authority | Applicable Targets | Key Requirements |
|---|---|---|---|
| Algorithm Filing | Cyberspace Administration of China (CAC) | All algorithms with public opinion attributes or social mobilization capabilities | Algorithm principles, data sources, safety assessment |
| Large Model Filing | Cyberspace Administration of China (CAC) | Generative AI services provided to the public | Safety assessment, content moderation mechanisms, training data compliance |
X-DeepSynth: true identifier1. Define Success Criteria → 2. Select Evaluation Metrics → 3. Build Golden Test Dataset
↓
4. Offline Evaluation → 5. Human Evaluation → 6. Iterative Optimization
↓
7. Controlled Rollout (Canary) → 8. Continuous Monitoring → 9. Documentation
| Dimension | Key Metrics | Evaluation Method |
|---|---|---|
| Performance | Accuracy, Recall, F1, Latency P95 | Automated testing + Golden Dataset |
| Robustness | Adversarial sample resistance, edge case handling | Boundary testing, adversarial testing |
| Fairness & Safety | Bias detection, harmful content filtering rate | Bias audit, Red teaming |
| Factuality & Hallucination | Hallucination rate, factual consistency | RAGAS Faithfulness, human review |
| Consistency & Reliability | Same input → Same output stability | Repeated testing, regression testing |
| Tool | Positioning | Core Capabilities |
|---|---|---|
| Promptfoo | Lightweight evaluation | CLI-driven, rapid comparison of multiple Prompts/models |
| RAGAS | RAG-specific | Faithfulness/Relevancy/Recall/Precision |
| DeepEval | General evaluation | Hallucination detection, bias detection, toxicity detection |
| LangSmith | Full-chain | Tracing + Evaluation + Human annotation |
| LangFuse | Open-source observability | Tracing + Evaluation + Cost tracking |
| TruLens | Feedback analysis | RAG triad evaluation (Answer/Context/Groundedness) |
| Arize Phoenix | Observability | LLM observability + Retrieval analysis |
| MLflow | Experiment management | Model experiment tracking + Model registry |
| Deepchecks | Data validation | Training data quality + Data drift detection |
Code Commit → Unit Tests → Evals Automation → Quality Gate
│
┌─────────────┼─────────────┐
▼ ▼ ▼
Hallucination Accuracy Safety Violation
Detection Regression Detection
(threshold <5%) (no regression (zero tolerance)
>2%)
| Layer | Key Players | Competitive Landscape | PM Focus |
|---|---|---|---|
| Compute Layer | NVIDIA / Huawei Ascend / Cambricon / Hygon | NVIDIA dominates alone, domestic players accelerating catch-up | Compute cost trends, domestic substitution window |
| Model Layer | OpenAI / Google / Anthropic / Meta / Baidu / Alibaba / Zhipu / DeepSeek | Closed-source vs open-source dual-track competition | Model capability boundaries, API pricing, open-source model usability |
| Platform Layer | LangChain / LlamaIndex / Dify / Bailian / Wenxin | Toolchains + Cloud platforms | RAG/Agent development frameworks, MaaS platforms |
| Application Layer | Microsoft Copilot / Salesforce Einstein / Various AI-native apps | A hundred flowers blooming | Scenario selection, user experience, data flywheel |
Full content moved to
references/ai-industry-trends-2026.mdfor size optimization. This section contains detailed analysis, frameworks, and data tables. See the reference file for the complete content.
Key Topics Covered:
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
→ See full content in references/ai-industry-trends-2026.md
Discovery (AI Capability Exploration): Hypothesis → Prompt Prototype → Golden Dataset Evaluation → Alpha → Beta → A/B Validation → Launch
Delivery (AI Product Delivery): Review → Prompt/Model Change → Offline Evaluation → Gradual Rollout (5%→25%→100%) → Monitoring → Iteration
| Day | Agenda |
|---|---|
| Monday | AI Metrics Review + Weekly Planning |
| Tuesday | User Research + Bad Case Deep Analysis |
| Wednesday | Prompt/RAG/Agent Design (Deep Work) |
| Thursday | Cross-team Alignment + Safety Review |
| Friday | Golden Dataset Maintenance + AI Knowledge Sharing |
□ Golden Dataset evaluation passed (core metrics not degraded)
□ Red team testing completed and high-risk items fixed
□ Safety guardrails deployed and tested
□ Cost model updated and reviewed
□ Monitoring alerts configured
□ Degradation/rollback plan prepared
□ Help documentation updated (users need to know how to interact with AI)
□ Gradual rollout plan confirmed
□ Legal/compliance signed off
┌────────────────────┐
│ AI Business │ ← AI Monetization / Token Economics / Market Judgment
│ Thinking (25%) │
├────────────────────┤
│ AI Technical │ ← Model Capabilities / RAG / Agent / Prompt / Evaluation
│ Literacy (30%) │
├────────────────────┤
│ AI Product │ ← AI UX / Trust Design / HITL / Interaction Patterns
│ Design (25%) │
├────────────────────┤
│ Product │ ← User Research / Requirements Analysis / Data Analysis
│ Fundamentals │
│ (20%) │
└────────────────────┘
| Level | Experience | Core Competencies |
|---|---|---|
| Junior AI PM | 0-2 years | Prompt engineering basics, AI evaluation execution, AI feature PRD writing |
| Mid-level AI PM | 2-5 years | RAG/Agent solution design, Golden Dataset construction, AI UX design |
| Senior AI PM | 5-8 years | Model selection decisions, AI product strategy, safety system design, AI commercialization |
| AI Product Director | 8-12 years | AI product portfolio, Build/Buy decisions, AI team building |
| Chief AI Officer | 12+ years | Company AI strategy, AI governance, AI culture, AI investment portfolio |
Must Understand (able to have effective dialogue with ML engineers):
□ How LLMs work (Transformer / Attention Mechanism / Tokens)
□ Advanced Prompt Engineering (CoT / ReAct / Few-Shot)
□ RAG Architecture (Chunking / Retrieval / Reranking / Evaluation)
□ Agent Architecture (Tool Calling / Memory / Planning / HITL)
□ Model Evaluation Methods (Golden Dataset / LLM-as-Judge / A/B Testing)
□ Token Economics (Cost Estimation / Model Routing / Caching Strategy)
□ AI Security Basics (Injection / Jailbreak / Guardrails / Red Team Testing)
Bonus Items:
□ Fine-tuning Basics (SFT / RLHF / DPO / LoRA)
□ MLOps & AI Observability
□ GPU Economics & Inference Optimization
□ AI Governance & Compliance (EU AI Act / China Administrative Measures)
□ Multimodal AI Basics
| Document | Audience | Detailed Template |
|---|---|---|
| AI Product PRD | Dev/ML Team | references/templates/ai-prd-template.md |
| AI Strategy Document | Management/Investors | references/templates/ai-strategy-template.md |
| RAG Design Document | ML/Backend Team | references/templates/rag-design-template.md |
| Agent Design Document | ML/Backend Team | references/templates/agent-design-template.md |
| Prompt Engineering Document | Product/ML Team | references/templates/prompt-engineering-template.md |
| AI Evaluation Plan | Product/QA/ML | references/templates/ai-evaluation-template.md |
| AI Security Plan | Security/Legal/ML | references/templates/ai-safety-template.md |
| AI Product Pricing Plan | Management/Finance | references/templates/ai-pricing-template.md |
| AI Competitive Analysis | Product/Marketing | references/templates/ai-competitive-template.md |
| # | Diagram Type | Purpose | Tool |
|---|---|---|---|
| 1 | RAG Architecture Diagram | RAG pipeline overview | drawio-skill |
| 2 | Agent Architecture Diagram | Agent / Multi-Agent system | drawio-skill |
| 3 | Model Routing Flowchart | Multi-model routing decisions | drawio-skill |
| 4 | AI Evaluation Pipeline Diagram | Evaluation process + data flow | drawio-skill |
| 5 | Safety Guardrail Layered Diagram | Multi-layer safety protection | drawio-skill |
| 6 | Data Flywheel Diagram | User → Data → AI Improvement Loop | excalidraw-diagram |
| 7 | AI Product Full-Stack Architecture Diagram | Product technical architecture | drawio-generator-pro |
"Generate an HTML prototype for an AI customer service chatbot"
→ Chat interface + Confidence display + Source citations + Human handoff + Empty state guidance
"Generate an HTML prototype for an AI contract review tool"
→ Upload contract → AI flags risky clauses → User confirms/modifies → Export report
"Generate an HTML prototype for an AI data analysis agent"
→ Natural language input → Agent thinking steps display → Visualized results → Download & share
| Risk Level | Requirements | Product Examples |
|---|---|---|
| Unacceptable | Completely prohibited | Social credit scoring, real-time remote biometric identification |
| High Risk | Compliance assessment + human oversight + transparency + EU registry | Medical AI, recruitment AI, credit approval |
| Limited Risk | Inform users "you are interacting with AI" | Chatbots, AI-generated content |
| Minimal Risk | No additional obligations | AI filters, AI recommendations |
"Deployer Trap": Enterprises using third-party AI APIs may also bear obligations.
| Regulation | Core Requirements |
|---|---|
| Generative AI Service Administrative Measures | Safety assessment + algorithm filing + content moderation + training data compliance |
| Deep Synthesis Administrative Provisions | Synthetic content labeling + user real-name verification + review mechanism |
| Personal Information Protection Law | PII compliance in training data |
| Algorithm Recommendation Administrative Provisions | Algorithm filing + user right to know + opt-out mechanism |
China AI "Triple Registration": Algorithm Filing → AI Safety Assessment → Content Safety Review
| # | Anti-Pattern | Correct Approach |
|---|---|---|
| 1 | "Just Stuff AI In First" — AI for AI's sake | First ask whether AI truly solves the problem |
| 2 | Competing with OpenAI at the Model Layer | Build proprietary data and experience moats at the application layer |
| 3 | Ignoring the Data Flywheel | Design implicit feedback collection mechanisms from Day 1 |
| 4 | Pursuing SOTA Instead of Good Enough | Model routing: simple → small model, complex → large model |
| 5 | "AI Will Optimize Itself" | Establish a continuous loop of evaluation → analysis → optimization |
| # | Anti-Pattern | Correct Approach |
|---|---|---|
| 6 | Defaulting Everything to Agent | First evaluate with simple LLM calls, upgrade only if insufficient |
| 7 | Ignoring Token Costs | Monitor inference cost per interaction from Day 1 |
| 8 | Context Window Abuse | Curate context, don't pile everything in |
| 9 | RAG Only Using Vector Retrieval | BM25 + Vector + Reranker is the production baseline |
| 10 | Evaluation Set Made Up by PM | Build from real user query sampling |
| # | Anti-Pattern | Correct Approach |
|---|---|---|
| 11 | Black-box AI — Not showing reasoning process | Show reasoning steps + cite sources |
| 12 | No Exit Design | Always keep a manual operation path |
| 13 | Pretending 100% Certainty | Display confidence levels when uncertain |
| 14 | AI Frequently Interrupting Users | Passive assistance, not proactive interruption |
| 15 | Ignoring Loading Experience | Streaming output + progress indicators + skeleton screens |
| # | Anti-Pattern | Correct Approach |
|---|---|---|
| 16 | "Launch First, Security Later" | At minimum deploy basic input/output guardrails |
| 17 | Not Telling Users It's AI | Clearly label AI identity |
| 18 | Releasing Without Red Team Testing | At minimum internal red team testing before launch |
| 19 | Ignoring Low-Resource Language Security | Test jailbreak risks for all supported languages |
| 20 | No Emergency Circuit Breaker | One-click stop all AI features |
| # | Anti-Pattern | Correct Approach |
|---|---|---|
| 21 | Selling AI with Per-Seat Pricing | Hybrid model (base fee + usage) |
| 22 | Unlimited Free AI Usage | Set strict usage caps on free tier |
| 23 | Not Tracking User-Level Costs | Must be clear on each user's input-output ratio |
| 24 | Underestimating Price Wars | Token prices drop 10x annually, moat is data and experience |
| 25 | AI Gross Margin < 50% | Maintain 60%+ AI gross margin |
In the AI era, Product Sense is the only irreplaceable capability.
Three Pillars:
1. Cognitive Empathy — See through the human needs behind users' irrational behavior
AI is "emotionally colorblind": understands data, but not human hearts
2. Aesthetics & Taste — Sharp intuition for whether an interaction feels "right"
AI can generate ten thousand solutions, but can't tell you which one gives users a visceral resonance
3. Business Intuition — Instantly see through the value exchange model behind anything
AI lacks the ability to make correct decisions in extreme business ambiguity
1. Evaluation-Driven Development
→ Build the evaluation set first, then write the Prompt. Evaluation is a navigator, not a checkpoint.
2. RAG is the Default Choice, Not the Last Resort
→ Most B2B AI products start with RAG.
3. Start Simple, Progressively Complex
Simple LLM → +RAG → +Tools → +Agent → +Multi-Agent
Don't skip levels! Verify each level before upgrading.
4. Systems Thinking, Not Model Thinking
Excellent AI Product = Model (30%) + Context Engineering (25%) + Evaluation (20%) + Safety (15%) + UX (10%)
"All PMs will become AI PMs"
---
---
## Global AI Governance & Regulatory Landscape
### International AI Regulatory Comparison Matrix
| Region | Key Regulation | Risk Classification | Key Requirements | Enforcement Body | Penalties |
|--------|---------------|---------------------|------------------|------------------|-----------|
| **EU** | EU AI Act (2024) | 4 tiers: Unacceptable / High / Limited / Minimal | CE marking, conformity assessment, fundamental rights impact assessment | EU AI Office + National Authorities | Up to €35M or 7% global turnover |
| **US** | AI Executive Order 14110 (2023) + NIST AI RMF 1.0 | Voluntary framework (4 functions: Govern, Map, Measure, Manage) | Red-teaming for dual-use models, safety testing, content watermarking | NIST + Sector-specific agencies | Sector-specific enforcement |
| **UK** | AI Regulation White Paper (2023) + AI Safety Institute | Context-specific, principles-based (5 principles) | Cross-sectoral principles, no new regulator | Existing regulators (ICO, FCA, CMA, etc.) | Sector-specific |
| **China** | Generative AI Administrative Measures (2023) + Algorithm Filing | Content-based classification | Algorithm filing, security assessment, content moderation, training data compliance | Cyberspace Administration of China (CAC) | Service suspension, fines, criminal liability |
| **Japan** | AI Guidelines for Business (2024) | Non-binding, sector-specific | Voluntary AI governance framework, human-centric principles | METI + MIC | Sector-specific |
| **Singapore** | PDPC AI Governance Framework + AI Verify | Voluntary, risk-based | AI Verify testing toolkit, model governance framework | PDPC + IMDA | PDPA enforcement |
| **Canada** | AIDA (Bill C-27, proposed) | High-impact systems | Risk assessment, transparency, human oversight | Proposed AI & Data Commissioner | Up to CAD $25M or 5% revenue |
| **South Korea** | AI Basic Law (proposed 2024) | High-risk AI systems | Safety certification, human oversight, transparency | MSIT + KISA | Up to KRW 300M or 3% revenue |
| **UAE** | UAE AI Strategy 2031 + AI Ethics Guidelines | Sector-specific | AI ethics principles, sector-specific guidelines | UAE AI Office | Sector-specific |
### NIST AI RMF 1.0 — Four Functions Deep Dive
The NIST AI Risk Management Framework (AI RMF 1.0) provides a structured approach to AI risk management:
| Function | Core Activities | AI PM's Role |
|----------|----------------|-------------|
| **GOVERN** | Establish AI risk culture, policies, accountability | Define AI governance structure, risk appetite, escalation paths |
| **MAP** | Understand AI system context, identify risks | Map AI system components, data flows, stakeholders, potential harms |
| **MEASURE** | Assess AI risks using quantitative/qualitative methods | Design evaluation metrics, monitor trustworthiness characteristics |
| **MANAGE** | Prioritize and respond to AI risks | Implement guardrails, document risk decisions, continuous monitoring |
### California SB 1047 — Safe and Secure Innovation for Frontier AI Models
Key provisions relevant to AI PMs:
- **Safety testing**: Mandatory safety testing for models above compute threshold (10^26 FLOPS)
- **Kill switch**: Full shutdown capability required
- **Third-party auditing**: Annual third-party safety audits
- **Liability**: Developer liability for catastrophic harms
- **Whistleblower protection**: Employee protection for reporting safety concerns
### UK AI Safety Institute (AISI)
The UK's approach focuses on frontier AI safety evaluation:
- **Pre-deployment testing**: Voluntary but strongly encouraged for frontier models
- **Evaluation domains**: Cybersecurity, bio/chemical capabilities, autonomous systems, societal impacts
- **International collaboration**: Bletchley Declaration, Seoul Summit, AI Safety Summits
- **PM implications**: Prepare for pre-deployment evaluation requirements, document safety testing results
### APAC AI Governance Deep Dive
| Country | Regulation/Policy | Status | Key Requirements | PM Implications |
|---------|------------------|--------|------------------|-----------------|
| **Japan** | AI Promotion Act (2025) | Enacted June 2025 | Non-binding framework; R&D promotion; transparency goals; no criminal penalties | Light-touch compliance; focus on manufacturing/healthcare/robotics AI |
| **Singapore** | PDPC AI Governance Framework + AI Verify | Active (v2.0, 2024) | AI Verify testing toolkit; model governance; financial sector GenAI guidelines | Use AI Verify for product testing; financial services AI has extra requirements |
| **South Korea** | AI Basic Law (2024) | Enacted | High-risk AI safety certification; human oversight; transparency; KRW 300M penalty | Safety certification for high-risk products; mandatory human-in-the-loop |
| **India** | DPDP Act 2023 + National AI Strategy | Active | Data protection; emerging AI framework; no comprehensive AI law yet | Data localization considerations; large population = diverse training data needs |
| **Australia** | Voluntary AI Safety Standards + National AI Plan | Active | Ethical AI in government; voluntary standards; sector-specific guidance | Government AI procurement standards; ethical AI principles alignment |
### Middle East AI Governance
| Country | Regulation/Policy | Status | Key Requirements | PM Implications |
|---------|------------------|--------|------------------|-----------------|
| **UAE** | UAE AI Strategy 2031 + AI Ethics Guidelines | Active | Sector-specific AI ethics; government AI adoption targets; AI Office oversight | Sovereign AI focus; government procurement opportunities; bilingual (AR/EN) requirements |
| **Saudi Arabia** | SDAIA AI Ethics Principles (2023) + Vision 2030 | Active | AI ethics governance; data classification; national AI strategy | Government-driven AI adoption; large-scale smart city projects (NEOM) |
| **Qatar** | National AI Strategy (2024) | Active | AI R&D investment; education sector AI; data governance | Growing AI hub; education and healthcare AI focus |
### LATAM AI Governance
| Country | Regulation/Policy | Status | Key Requirements | PM Implications |
|---------|------------------|--------|------------------|-----------------|
| **Brazil** | AI Bill (PL 2338/2023) + AI Ethics Guidelines | Under review | Risk-based classification; transparency; user rights; algorithmic impact assessment | Modeled on EU approach; LGPD data protection alignment; growing fintech AI |
| **Mexico** | National AI Strategy (2024) | Active | AI for social good; government AI adoption; ethics framework | Government-led AI adoption; nearshoring opportunities for US companies |
| **Argentina** | AI for Social Good Guidelines | Active | AI ethics; social impact focus; public sector AI | Public sector AI projects; growing AI talent pool |
### Africa AI Governance
| Country | Regulation/Policy | Status | Key Requirements | PM Implications |
|---------|------------------|--------|------------------|-----------------|
| **South Africa** | National AI Strategy (2024 draft) | Draft | AI for development; ethical AI; skills development | Leading African AI hub; fintech and healthtech AI focus |
| **Kenya** | National AI Strategy (2025) | Active | AI for agriculture/healthcare/fintech; data protection; innovation hubs | Mobile-first AI products; M-Pesa fintech ecosystem; growing startup scene |
| **Nigeria** | National AI Policy (2024) | Active | AI for economic diversification; local language AI; skills development | Largest African market; local language AI (Yoruba, Hausa, Igbo); fintech AI |
| **African Union** | AU AI Continental Strategy (2024) | Active | Pan-African AI harmonization; capacity building; data sovereignty | Regional harmonization efforts; infrastructure challenges = edge AI opportunities |
---
## Global AI PM Career Landscape
### International AI PM Salary Benchmarks (2025)
| Region | Entry-Level (0-3 yrs) | Mid-Level (3-7 yrs) | Senior (7-12 yrs) | Director/VP (12+ yrs) |
|--------|----------------------|---------------------|-------------------|----------------------|
| **US (SF/NYC)** | $140K-180K | $200K-280K | $300K-450K | $500K-800K+ |
| **US (Other Tech Hubs)** | $120K-150K | $170K-230K | $250K-350K | $400K-600K |
| **Canada (Toronto/Vancouver)** | CAD $100K-130K | CAD $140K-180K | CAD $200K-280K | CAD $300K-450K |
| **UK (London)** | £70K-90K | £100K-140K | £150K-200K | £200K-350K |
| **Germany (Berlin/Munich)** | €65K-85K | €90K-120K | €130K-170K | €180K-250K |
| **Netherlands** | €60K-80K | €85K-115K | €120K-160K | €170K-230K |
| **Singapore** | SGD $90K-120K | SGD $130K-180K | SGD $200K-280K | SGD $300K-450K |
| **Japan (Tokyo)** | ¥8M-12M | ¥13M-18M | ¥20M-28M | ¥30M-50M |
| **UAE (Dubai/Abu Dhabi)** | AED 300K-400K | AED 450K-600K | AED 650K-900K | AED 1M-1.5M+ |
| **India (Bangalore)** | ₹15L-25L | ₹30L-50L | ₹60L-1Cr | ₹1.2Cr-2.5Cr |
| **Australia (Sydney)** | AUD $120K-150K | AUD $160K-200K | AUD $220K-300K | AUD $350K-500K |
### Big Tech AI PM Career Ladders
| Level | Google | Meta | Microsoft | Amazon |
|-------|--------|------|-----------|--------|
| **Entry** | APM / L3 | RPM | PM1 | PM (L5) |
| **Mid** | PM L4-L5 | PM IC4-IC5 | PM2 | Sr PM (L6) |
| **Senior** | Sr PM L6 | PM IC6 | Principal PM | Principal PM (L7) |
| **Staff** | Staff PM L7 | PM IC7 | Partner PM | Sr Principal (L8) |
| **Director** | Director L8 | Director M1 | Director | Director (L8+) |
| **VP** | VP L9+ | VP M2+ | VP | VP |
### International AI PM Certifications
| Certification | Issuing Body | Focus | Global Recognition |
|--------------|-------------|-------|-------------------|
| **IAPP AIGP** | IAPP | AI Governance Professional | High (US/EU) |
| **ISO/IEC 42001 Lead Implementer** | Various | AI Management System | High (Global) |
| **AI Product Manager Certificate** | Product School | AI PM Skills | Medium (US) |
| **AI for Product Management** | Reforge | AI PM Strategy | Medium (US) |
| **Google AI PM Certificate** | Google | AI PM Foundations | Medium (Global) |
| **Duke AI Product Manager** | Duke University | AI PM Certificate | Medium (US) |
---
## Global AI Startup Ecosystem
### Major AI Hubs by Region
| Region | Key Hubs | Notable AI Startups | Key Strengths |
|--------|---------|---------------------|---------------|
| **US** | SF Bay Area, NYC, Boston, Seattle | OpenAI, Anthropic, Scale AI, Databricks | Deep tech, VC funding, talent density |
| **Canada** | Toronto, Montreal, Vancouver | Cohere, Element AI, Waabi | NLP research, government funding, talent pipeline |
| **UK** | London, Cambridge, Edinburgh | DeepMind, Stability AI, Graphcore | Research excellence, AI safety focus |
| **EU** | Paris, Berlin, Amsterdam, Stockholm | Mistral AI, Aleph Alpha, Helsing | Open-source models, defense AI, regulation |
| **Israel** | Tel Aviv | AI21 Labs, Gong, Run:ai | Enterprise AI, cybersecurity, NLP |
| **China** | Beijing, Shanghai, Shenzhen, Hangzhou | Zhipu AI, Moonshot AI, DeepSeek, MiniMax, Baichuan | Scale, mobile-first, government support |
| **Singapore** | Singapore | PatSnap, Advance.AI, Taiger | SEA hub, multilingual, fintech AI |
| **India** | Bangalore, Hyderabad, Mumbai | Krutrim, Sarvam AI, Haptik | Indic languages, enterprise, cost efficiency |
| **UAE** | Dubai, Abu Dhabi | G42, Falcon AI, AI71 | Sovereign AI, government investment, compute |
| **South Korea** | Seoul | Upstage, Riiid, Liner | AI education, enterprise AI, hardware |
### Global AI Funding by Region (2024-2025)
| Region | Total AI VC Funding | YoY Growth | Top Segments |
|--------|--------------------|------------|--------------|
| **North America** | ~$80B | +35% | Foundation models, AI infrastructure, enterprise AI |
| **China** | ~$15B | +20% | LLM applications, autonomous driving, AI chips |
| **Europe** | ~$12B | +30% | AI safety, enterprise AI, open-source models |
| **Middle East** | ~$5B | +50% | Sovereign AI, smart city, fintech AI |
| **APAC (ex-China)** | ~$8B | +25% | Enterprise AI, AI education, fintech AI |
| **LATAM** | ~$2B | +40% | Fintech AI, agritech AI, healthtech AI |
| **Africa** | ~$0.5B | +60% | Fintech AI, agritech AI, healthtech AI |
---
## Global AI Adoption & Market Data
### Enterprise AI Adoption by Region (2025)
| Region | AI Adoption Rate | GenAI in Production | Top Use Cases |
|--------|-----------------|--------------------|---------------|
| **North America** | 72% | 34% | Customer service, content generation, code generation |
| **Europe** | 58% | 22% | Process automation, compliance, customer insights |
| **APAC** | 65% | 28% | Customer engagement, operations, product development |
| **China** | 70% | 30% | Content generation, customer service, enterprise search |
| **Middle East** | 55% | 18% | Smart city, government services, fintech |
| **LATAM** | 45% | 12% | Customer service, marketing, fintech |
| **Africa** | 35% | 8% | Fintech, agriculture, healthcare |
### Global AI Spending Forecast (Gartner, 2025)
| Year | Global AI Spending | YoY Growth | Key Drivers |
|------|-------------------|------------|-------------|
| 2024 | ~$200B | +25% | GenAI adoption, LLM infrastructure |
| 2025 | ~$300B | +50% | Agent AI, enterprise AI platforms |
| 2026 | ~$450B | +50% | AI-native applications, autonomous agents |
| 2027 | ~$650B | +44% | AI-first enterprises, AI hardware |
| 2028 | ~$900B | +38% | Ubiquitous AI, AI-driven business models |
## Final Reminders
> **The Ultimate AI PM Mindset (12 Iron Rules):**
>
> 1. **AI is a tool, not magic** — Start from narrow scenarios, build moats with data
> 2. **Models will commoditize, data and experience won't** — The moat is proprietary data and unique experience
> 3. **Evaluation is the foundation of everything** — Without evaluation, there is no AI product iteration
> 4. **Trust is the currency of AI products** — Lose it once, it takes 10 perfect performances to win back
> 5. **Token cost = your COGS** — A product that doesn't track costs is blind
> 6. **Safety is not a feature, it's infrastructure** — Design safety on Day 1, not as an afterthought
> 7. **Progressive trust > one-shot release** — Start with low-risk features
> 8. **Humans always stay in the loop** — Don't let AI make decisions it can't be held accountable for
> 9. **Simple > flashy** — A precise classification model can be more valuable than a hallucinating Agent
> 10. **Prompts are the UI of AI products** — Treat prompts well, iterate on them like product interfaces
> 11. **Data flywheel > model capability** — Proprietary data accumulation doesn't happen automatically
> 12. **You are the CEO of your AI product** — Not a "model API wrapper"; take full responsibility for the user's AI experience
---
> **Get started: Tell me directly what you want to do, and the Skill will automatically match the phase, methodology, and toolchain.**
> Whether you're a traditional PM transitioning to AI or new to the AI field, this Skill is your AI PM super-assistant.
---
## Tool Integration Summary
| Task | Primary Tool | Alternative |
|------|-------------|-------------|
| Draw AI architecture diagrams | `drawio-skill` | `drawio-coderknock` |
| Draw data flywheel / journey maps | `excalidraw-diagram` | - |
| Generate AI product prototypes | Generate HTML directly (Tailwind+Alpine.js) | - |
| Write AI PRD / strategy docs | Generate Markdown directly | `word-docx` |
| Make AI strategy PPTs | `pptx-2` | `deck-generator` |
| Build evaluation datasets | `xlsx` | - |
| Prompt version comparison | Git (Markdown) | - |
---
## Usage Examples
### Example 1: Design an AI Customer Service Product from 0 to 1
User: Help me design an AI customer service chatbot product
Output:
### Example 2: Add AI Contract Review to Existing CRM
User: Add AI contract review functionality to my CRM
Output:
---
> **Get started: Tell me directly what you want to do, and the AI PM Skill will automatically match the phase, methodology, and toolchain.**
---
## Appendix A: Embedding Model Selection Deep Dive
> Embeddings are the cornerstone of RAG. Choose the wrong embedding model, and even the best retrieval strategy is wasted.
### Mainstream Embedding Model Comparison
| Model | Dimensions | Max Input | MTEB Chinese | Cost/1M tokens | Best Scenario |
|------|-----------|----------|-------------|---------------|---------|
| text-embedding-3-large (OpenAI) | 3072/256/1024 | 8191 | Medium | $0.13 | English, multilingual |
| text-embedding-3-small | 1536/512 | 8191 | Low-Medium | $0.02 | Cost-sensitive English |
| bge-large-zh-v1.5 (BAAI) | 1024 | 512 | ⭐⭐⭐⭐⭐ | Open-source free | Top choice for Chinese |
| bge-m3 (BAAI) | 1024 | 8192 | ⭐⭐⭐⭐⭐ | Open-source free | Multilingual + long documents |
| stella-base-zh-v3-1792d | 1792 | 512 | ⭐⭐⭐⭐ | Open-source free | High-precision Chinese retrieval |
| multilingual-e5-large | 1024 | 512 | ⭐⭐⭐⭐ | Open-source free | Mixed multilingual |
| jina-embeddings-v3 | 1024 | 8192 | ⭐⭐⭐⭐ | Paid API | Long documents + multilingual |
### Selection Decision Tree
Primary language? ├── Mainly Chinese → bge-large-zh-v1.5 or stella-base-zh ├── Mainly English → text-embedding-3-large ├── Mixed multilingual → bge-m3 or multilingual-e5-large └── Need on-premise deployment → bge series (open-source)
Document length? ├── Short (<512 tokens) → bge-large-zh-v1.5 ├── Long (512-8192) → bge-m3 or jina-embeddings-v3 └── Extra-long (>8192) → Chunk first, then embed
Dimension preference? ├── Accuracy-first (>1024 dims) → text-embedding-3-large (3072) or stella (1792) ├── Speed-first (768 dims) → bge-base-zh-v1.5 └── Storage-first (<512 dims) → text-embedding-3-small (512)
### Embedding Quality Checklist
□ Synonym recall test: Can "contract expiration" retrieve "agreement termination"? □ Polysemy distinction test: Is "apple" correctly distinguished in tech vs. fruit contexts? □ Negation semantic test: Do retrieval results for "does not include XX" contain XX? □ Cross-language test: Can Chinese queries retrieve English documents? (if needed) □ Long document test: Does embedding quality degrade for 512+ token documents? □ Domain terminology test: How effective is retrieval for industry-specific terms?
---
## Appendix B: Vector Database Selection Decision Matrix
### Mainstream Vector Database Comparison
| Database | Type | Scale Limit | Performance | Ops Complexity | Best Scenario |
|--------|------|---------|------|-----------|---------|
| **pgvector** | PostgreSQL Extension | <1M vectors | Medium | Low | Already have PG, modest vector volume |
| **Milvus** | Dedicated Vector DB | >1B | ⭐⭐⭐⭐⭐ | High | Large-scale production |
| **Qdrant** | Dedicated Vector DB | >100M | ⭐⭐⭐⭐ | Medium | Medium scale, good performance |
| **Weaviate** | Dedicated Vector DB | >100M | ⭐⭐⭐⭐ | Medium | Need built-in modules (text/image) |
| **Pinecone** | Cloud Service | >1B | ⭐⭐⭐⭐ | Very Low | Don't want to self-manage |
| **Chroma** | Lightweight Embedded | <100K | Low | Very Low | Prototype/dev environment |
| **ElasticSearch** | Search Engine + Vector | >100M | ⭐⭐⭐ | Medium-High | Already have ES infrastructure |
### Selection Decision
Vector count < 100K + rapid prototyping → Chroma (zero ops) Vector count 100K-1M + already have PG → pgvector (zero additional cost) Vector count 1M-100M + small/medium team → Qdrant or Weaviate Vector count > 100M + enterprise-grade → Milvus (distributed scaling) Don't want to self-manage + sufficient budget → Pinecone Already have ES + need hybrid search → ElasticSearch
### Key Production Considerations
| Consideration | Key Question |
|--------|---------|
| High Availability | Supports master-slave replication? Failover time? |
| Backup & Recovery | Incremental backup? Full restore time? |
| Multi-tenancy | Partition/Collection-level isolation? RBAC? |
| Hybrid Query | Vector + scalar filter performance? Scalar index? |
| Cost | Storage cost/TB? Query cost/1M queries? |
---
## Appendix C: RAG Chunking Strategy Deep Dive
### Chunking Method Comparison
| Chunking Method | Principle | Pros | Cons | Best Scenario |
|---------|------|------|------|---------|
| **Fixed Size** | Split by token count (e.g., 512 tokens) | Simple, controllable, predictable | Cuts off semantics | Initial solution, uniform documents |
| **By Paragraph** | Split by natural paragraphs | Semantically complete | Uneven length | Well-structured documents |
| **By Heading Hierarchy** | Split by H1/H2/H3 levels | Preserves context hierarchy | Complex implementation | Technical docs/help docs |
| **Semantic Chunking** | Use LLM to determine optimal split points | Semantically optimal | High cost, slow | High-quality requirement scenarios |
| **Hybrid Chunking** | Paragraph + fixed size fallback | Balances semantics and control | Complex rules | Recommended for production |
| **Sentence Window** | Split by sentences + window context | Fine-grained retrieval | Large storage | Scenarios needing precise positioning |
### Chunk Size Selection Guide
Chunk size 256 tokens: → Pros: Precise retrieval, low latency → Cons: Incomplete context, easy semantic cutoff → Suitable for: FAQ, short answers
Chunk size 512 tokens: → Pros: Balanced precision and completeness ← Recommended starting point → Suitable for: Most RAG scenarios
Chunk size 1024 tokens: → Pros: More complete context → Cons: Retrieval may bring back irrelevant content → Suitable for: Complex documents, technical manuals
Chunk size 2048+ tokens: → Pros: Long paragraphs not cut off → Cons: Low signal-to-noise ratio → Suitable for: Legal documents, academic papers
### Chunking Optimization in Practice
Chunking optimization loop:
Key techniques:
---
## Appendix D: Agent Tool Design Patterns & Practices
### 5 Principles of Tool Design
Single Responsibility: One tool does one thing ❌ "query_database" — too vague ✅ "query_order_info" / "query_member_points" — specific
Self-Describing: Tool name + description lets LLM understand at a glance ✅ Tool name: cancel_order ✅ Description: Cancel a specified order, requires order number and cancellation reason, returns cancellation status
Verifiable Output: Return structured results so LLM can determine success ✅ {status: "success", data: {...}, error: null} ❌ "The order seems to have been cancelled" (LLM has to guess)
Idempotency: Same input, multiple calls produce the same result (especially for mutation tools) ✅ Create order: use idempotency_key to prevent duplicates ✅ Cancel order: already-cancelled orders return "already cancelled" rather than error
Error-Friendly: Error messages must have enough context for LLM to decide next steps ✅ {error: "Order ORD123 not found, possibly wrong order number or order doesn't belong to current user"} ❌ {error: "not found"}
### Tool Count Management
| Agent Type | Recommended Tool Count | Rationale |
|-----------|----------|------|
| Simple Agent | 3-5 | Reduce confusion |
| Standard Agent | 5-10 | Cover core capabilities |
| Complex Agent | 10-20 | Multi-domain |
| Super Agent | 20-50 | Requires tool grouping + dynamic loading |
**What if there are too many tools?**
- Tool grouping: Group by domain (order-related, member-related, knowledge base-related)
- Dynamic loading: Agent first determines intent → only loads relevant tool groups
- Tool naming convention: `domain_action_object` (e.g., `order_query_status`)
### Common Tool Calling Issues
| Issue | Manifestation | Solution |
|------|------|------|
| Hallucinated calls | Calls non-existent tools | Add "only use provided tools" in tool description |
| Parameter hallucination | Fabricates parameter values | Require Agent to extract parameters from user input / prior results |
| Loop calls | Repeatedly calls the same tool | Set max retry count (3 times) |
| Premature abandonment | Gives up after one failure | Include retry suggestions in tool response |
| Permission errors | Calls tools without permission | Include permission requirements in tool definition |
---
## Appendix E: AI Team Structure & Roles
### Typical AI Product Team Configuration
AI Product Team (10-30 people):
├── AI Product Manager (1-2 people) │ └── Responsible for: AI strategy / model selection / prompt design / evaluation system / AI UX │ ├── ML Engineers (2-4 people) │ ├── Responsible for: model fine-tuning / evaluation pipeline / model routing / RAG implementation │ └── PM needs to align on: model capability boundaries, fine-tuning data needs, evaluation metrics │ ├── Backend Engineers (2-4 people) │ ├── Responsible for: API / vector database / Agent tools / knowledge base ingestion pipeline │ └── PM needs to align on: API design, data models, performance requirements │ ├── AI Safety Engineer (1 person) │ ├── Responsible for: guardrails / red team testing / content moderation / compliance │ └── PM needs to align on: risk matrix, safety release standards │ ├── Prompt Engineer / Content Designer (1 person) │ ├── Responsible for: System Prompt / Few-Shot library / output quality optimization │ └── PM needs to align on: brand tone, content standards, user feedback │ ├── Data Annotation / Evaluation Specialist (1-2 people) │ ├── Responsible for: Golden Dataset maintenance / human evaluation / Bad Case analysis │ └── PM needs to align on: evaluation standards, annotation specifications, data quality │ └── AI UX Designer (1 person) ├── Responsible for: AI interaction patterns / trust design / prototypes / user research └── PM needs to align on: interaction principles, trust-building roadmap
### Effective PM-ML Engineer Collaboration
What PMs should NOT do: ❌ "This model isn't good enough, can you make it more accurate?" (too vague) ❌ "I read a paper saying XX model has the highest SOTA, let's use that" (benchmark-only thinking) ❌ "Why can't it answer this? You're an AI!" (not understanding capability boundaries)
What PMs SHOULD do: ✅ "In this 50-item test set, the model's accuracy on refund policy questions is 70%, the main failure modes are... I suggest optimizing from XX direction" (specific + data + suggestions) ✅ "For this scenario our latency budget is only 500ms, which of these three models can meet that? What accuracy can we achieve?" (constraints + trade-offs) ✅ "User feedback says AI responses are too verbose, here are 5 Bad Cases, can we adjust the Prompt or do we need fine-tuning?" (problem + evidence + options)
---
## Appendix F: AI Product Development Cadence & Milestones
### 6 Milestones for AI Product 0→1
M1: Problem Validation (1-2 weeks) Goal: Confirm AI is the right solution Output: AI Opportunity Assessment (including scoring matrix) Check: Score ≥ 3.5? Data accessible?
M2: Prompt Prototype (2-4 weeks) Goal: Prove feasibility with prompts Output: System Prompt V1 + 50 test results Check: Core scenario accuracy > 70%?
M3: RAG MVP (4-8 weeks) Goal: Minimum closed loop of knowledge base retrieval + generation Output: Basic RAG pipeline + 100-item evaluation baseline Check: RAGAS Faithfulness > 0.80?
M4: Alpha Internal Testing (2-4 weeks) Goal: Internal team daily use Output: Alpha usage feedback + Bad Case logs Check: Internal tester satisfaction > 3.5/5?
M5: Beta External Testing (4-8 weeks) Goal: Friendly customers real-world use Output: Beta evaluation report + improvement checklist Check: Willingness to pay > 50%? NPS > 30?
M6: GA Official Launch (2-4 weeks) Goal: Public commercial availability Output: Product launch + safety guardrails ready + monitoring ready Check: All safety checklist items passed?
### AI Product Iteration Cadence
| Iteration Type | Frequency | Content |
|---------|------|------|
| Prompt Optimization | Weekly | Fine-tune prompts based on Bad Cases |
| RAG Optimization | Bi-weekly | Chunking strategy / retrieval parameter tuning |
| Model Upgrade Evaluation | Quarterly | New model evaluation + migration assessment |
| Fine-tuning Iteration | Quarterly | Update fine-tuning with new annotated data |
| Safety Review | Monthly | Safety metrics review + red team spot checks |
| Golden Dataset Update | Monthly | Add new queries, retire outdated queries |
---
## Appendix G: AI Cost Optimization Practical Handbook
### 4 Phases of Cost Optimization
Phase 1: Monitoring Transparency (Day 1) □ Log token consumption for every AI call □ Cost breakdown by feature / by user / by model □ Free user vs. paid user cost comparison □ Build a cost dashboard
Phase 2: Low-Hanging Fruit (Month 1) □ Prompt streamlining (remove redundant instructions) — save 10-20% □ Output token limits (max_tokens) — save 5-15% □ Identical query result caching — save 20-40% □ Common FAQ pre-generation + caching — save 30-50%
Phase 3: Architecture Optimization (Months 2-3) □ Model routing (simple → small model) — save 40-60% □ Semantic caching (similar query reuse) — save 20-40% □ Batch processing (non-real-time scenarios) — save 20-50%
Phase 4: Deep Optimization (Months 4-6) □ Fine-tune small models to replace large models — save 50-80% □ Custom inference infrastructure — save 30-60% □ On-premise deployment of open-source models — TCO optimization
### Cost Anomaly Detection Rules
Alert trigger conditions: □ Single user daily cost > 200% of yesterday □ Single user daily cost > overall P99 □ Free user daily cost > $1 □ Large model share in model routing > 40% (check if routing is broken) □ Cache hit rate < 30% (check if cache is broken) □ AI gross margin < 50% (check cost structure)
---
## Appendix H: AI PM Daily Practical Scenarios
### Scenario 1: Prompt Debugging
PM daily workflow:
Debugging tips:
### Scenario 2: Model Upgrade Evaluation
When upstream models upgrade (e.g., Claude Sonnet 3.5→4):
### Scenario 3: User Says "AI Isn't Smart Enough"
User's "not smart enough" could mean: ├── Didn't understand their intent → Query rewriting / clarification mechanism ├── Response too generic → Retrieval not precise enough / missing user profile ├── Response incorrect → Knowledge base outdated / hallucination / insufficient model capability ├── Wrong format → Prompt output format constraints insufficient ├── Response too slow → Model too large / no streaming output ├── Missing citations → RAG didn't return sources ├── Unnatural tone → Prompt role description needs adjustment └── Didn't remember context → Conversation history management issue
PM's diagnostic process:
---
## Appendix I: AI Product Technology Radar
### AI Technology Trends Worth Watching in 2025-2026
| Technology | Maturity | Impact on PMs | Action Recommendation |
|------|--------|-----------|---------|
| **Long Context Windows (1M+)** | Available | May simplify RAG design | Evaluate ROI of long context vs. RAG |
| **Multimodal (Vision + Voice)** | Rapidly maturing | Expands product scenarios | Think about scenarios multimodal can unlock |
| **AI Agent Standardization** | Early stage | Agents will become as common as APIs | Watch MCP/A2A and other Agent protocols |
| **Small Language Models (SLM) Boom** | In progress | On-device + low-cost inference | Evaluate feasibility of small models in specific scenarios |
| **AI Code Generation** | Mainstream | Changes software development | Use AI tools to accelerate prototype validation |
| **Real-time AI (Voice/Video)** | Rapidly maturing | Voice Agents become new entry points | Evaluate applicable scenarios for voice interaction |
| **AI Safety Automation** | Early stage | Reduces safety ops costs | Watch automated red team testing tools |
| **RAG 2.0** | Early stage | GraphRAG/Agentic RAG maturing | Try new patterns in complex scenarios |
### Impact of Technology Trends on Product Strategy
Trend 1: Accelerating model commoditization → The moat is not in model choice, but in data flywheel and user experience → Design model-switchable architecture (don't lock into one vendor)
Trend 2: Rapidly declining inference costs → AI can unlock more and more scenarios → But also means competitors can quickly replicate your AI features → Differentiation lies in proprietary data and deep integration
Trend 3: Agents moving from demo to production → 2025-2026 is the critical year for Agents from experimentation to production → Agent reliability, safety, and cost remain major challenges → PMs need to think: Do users really need an Agent? Or just a better UI?
Trend 4: AI shifting from "assisting" to "doing" → AI from assistive tool → autonomously completing work → Sequoia: Service-as-a-Software → PM's reflection: Does your product help users do, or do it for users?
| Version | Date | Change Description |
|---|---|---|
| V1.1.0 | 2026-06-16 | Deep upgrade: Added AI Agent 4 design patterns + 5 architecture patterns (Reflection/Tool Use/Planning/Multi-Agent + ReAct/Plan-Execute/LLM Compiler/BabyAGI/Smolagents) + framework selection decision tree, multi-Agent collaboration modes (Hierarchical/Peer Collaboration/Market Bidding), Agentic RAG architecture design (5-generation evolution + ReAct loop + GraphRAG 94% accuracy + RAGAS evaluation framework), RAG technology selection full-stack guide (document parsing layer/text chunking layer/Embedding/vector database/hybrid retrieval + RRF), EU AI Act compliance deep dive (4-tier risk classification + 2026 deadline + GDPR overlapping obligations), China generative AI regulatory system (dual filing system + deep synthesis labeling + 748 models filed), AI Evaluation Evals system (9-step process + 5-dimension framework + 9-tool matrix + CI/CD quality gates), LLM industry chain 4-tier panorama (compute/models/platforms/applications), 2026 AI industry top 10 trends. Unified copyright notice + disclaimer. Based on four rounds of deep research (Andrew Ng/Microsoft GraphRAG/EU AI Act/Cyberspace Administration of China/OpenAI/LangChain and other authoritative sources) |
| V1.0 | 2026-06-02 | Initial version, covering 12 phases + AI PM full-stack capabilities |
Author: yinjianheng (殷健恒) Contact: email: yinjianheng@foxmail.com / wechat: YJH-yinjianheng License: Free and open-source, for personal use only
温馨提示:本 Skill 为个人开源作品,仅供个人学习、研究及非商业用途。未经作者书面授权,严禁任何形式的商业使用(包括但不限于转售、捆绑销售、商业培训、SaaS化服务等)。作者已委托专业知识产权律师团队进行全网监测,侵权必究。
Non-Professional Advice: The content provided by this Skill is for learning and reference only and does not constitute any form of professional advice (including but not limited to legal advice, financial advice, or technical decision-making advice). Users should consult appropriately qualified professionals before making any commercial or technical decisions.
Information Accuracy: While this Skill has made every effort to ensure the accuracy and timeliness of its content, it does not guarantee the completeness, accuracy, or applicability of all information. The AI field evolves extremely rapidly, and some content may become outdated over time. Users should verify critical information independently.
Limitation of Liability: To the maximum extent permitted by applicable law, the author assumes no liability for any direct, indirect, incidental, special, or consequential losses arising from the use of or reliance on the content of this Skill, including but not limited to business losses, data loss, system failures, or third-party claims.
Third-Party Content: The copyright of third-party frameworks, methodologies, tools, and standards referenced in this Skill (such as EU AI Act, Cyberspace Administration of China regulations, OpenAI, etc.) belongs to their respective rights holders. Such references do not constitute any affiliation or endorsement relationship between the author and the third-party rights holders.
Usage Compliance: Users should ensure that their use of this Skill complies with the laws, regulations, industry standards, and internal corporate policies of their country/region. Special reminder: AI product compliance requirements vary by region; please consult professional legal counsel.
💡 Every product decision defines the relationship between users and AI. Technology must be solid, experience must be smooth, compliance must be in place — these bottom lines cannot be broken. No matter how good the product is, it's better to clock off early and spend more time with the people who matter. —— yinjianheng (殷健恒)
💡 Every product decision defines the relationship between users and AI. Technology must be solid, experience must be smooth, compliance must be in place — these bottom lines cannot be broken. No matter how good the product is, it's better to clock off early and spend more time with the people who matter. —— yinjianheng (殷健恒)