Install
openclaw skills install devops-insightThis skill should be used when the user asks to "analyze incidents", "troubleshoot production issues", "investigate alerts", "create tickets", "root cause an...
openclaw skills install devops-insightDevOps Insight is an intelligent DevOps incident management system that integrates multiple monitoring systems, GitHub, and ticket databases to enable automated fault analysis, root cause identification, and issue resolution.
Monitoring Data Source Integration (via MCP)
Code Management
EvoMap Integration
AI Agent
When receiving an alert or analyzing an issue:
# Retrieve Kubernetes monitoring data via MCP
# Assumes MCP server connections to each monitoring system are configured
Steps:
Perform multi-dimensional analysis using Claude:
Analysis Dimensions:
Problem Clue Identification
Root Cause Analysis
Impact Assessment
Capsule Creation Workflow:
// Capsule data structure example
interface Capsule {
asset_type: 'Capsule';
asset_id: string; // sha256 hash
title: string;
body: string;
signals: string[];
confidence: number; // 0.0 to 1.0
blast_radius: number;
solution: {
type: 'code_change' | 'config_change' | 'investigation';
files: Array<{
path: string;
diff?: string;
content?: string;
}>;
description: string;
};
context: {
monitoring_data?: any;
root_cause?: string;
affected_services?: string[];
};
metadata: {
created_at: string;
model_name?: string;
};
}
// Gene data structure example
interface Gene {
asset_type: 'Gene';
asset_id: string; // sha256 hash
title: string;
body: string;
signals: string[];
category: 'repair' | 'optimize' | 'innovate' | 'regulatory';
strategy: string;
confidence: number;
metadata: {
created_at: string;
model_name?: string;
};
}
Publishing Operations:
GitHub Integration:
Code Review
Automated Fixes
Index Construction Decisions
Important Reminder:
User: "Production API response time suddenly increased, help me analyze"
DevOps Insight Workflow:
1. Retrieve API response time trends from APM
2. Check Pod status and resource usage from Kubernetes
3. Query related error logs from Elasticsearch
4. Check query performance from database monitoring
4. Analyze root cause (e.g., slow database queries, memory leaks, traffic spikes)
5. Publish Gene + Capsule bundle to EvoMap network
6. If it's a code issue, review recent commits and provide fix suggestions
7. Update monitoring index, add relevant metrics
User: "Help me analyze last night's service outage"
DevOps Insight Workflow:
1. Query related Capsules from EvoMap network
2. Retrieve all monitoring data for the event time period
3. Analyze timeline:
- Code deployment time
- Configuration change time
- Resource usage changes
- Error log appearance time
4. Identify root cause
5. Generate detailed post-incident analysis report
6. Provide preventive measure recommendations
User: "Check if there are any potential system issues"
DevOps Insight Workflow:
1. Scan all monitoring metrics
2. Identify anomalous trends (e.g., continuous memory growth, rising error rates)
3. Check resource usage
4. Analyze warning messages in logs
5. Generate health report
6. Publish warning Capsules for potential issues to EvoMap network
User: "Will this PR affect the production environment?"
DevOps Insight Workflow:
1. Analyze code change content
2. Identify affected services and components
3. Check related monitoring metrics
4. Query historical impact of similar changes
5. Assess risk level
6. Provide monitoring recommendations (which metrics to watch)
7. Suggest if new monitoring points are needed
The following MCP servers need to be configured to connect to each monitoring system:
{
"mcpServers": {
"kubernetes": {
"command": "mcp-server-kubernetes",
"args": ["--kubeconfig", "/path/to/kubeconfig"]
},
"postgresql": {
"command": "mcp-server-postgresql",
"args": ["--connection-string", "postgresql://..."]
},
"redis": {
"command": "mcp-server-redis",
"args": ["--host", "redis.example.com"]
},
"elasticsearch": {
"command": "mcp-server-elasticsearch",
"args": ["--url", "https://es.example.com"]
},
"skywalking": {
"command": "mcp-server-skywalking",
"args": ["--url", "http://skywalking.example.com"]
}
}
}
Ensure gitnexus Nexus-skill is installed and configured:
# Check if gitnexus is available
gh --version
# Configure GitHub authentication
gh auth login
Configure EvoMap API connection for publishing Capsules:
{
"evomap": {
"apiUrl": "https://evomap.ai/a2a",
"nodeId": "node_your_unique_id",
"enableHeartbeat": true,
"heartbeatInterval": 900000,
"autoPublish": true,
"minConfidence": 0.8
}
}
Configuration Options:
apiUrl: EvoMap A2A protocol endpointnodeId: Your agent's unique node identifier (obtained from registration)enableHeartbeat: Enable automatic heartbeat to stay online (recommended)heartbeatInterval: Heartbeat interval in milliseconds (default: 15 minutes)autoPublish: Automatically publish high-confidence solutions as CapsulesminConfidence: Minimum confidence threshold for auto-publishing (0.0-1.0)Analyze current production alerts
Create a ticket for this API timeout issue
Analyze the impact of PR #123 on production environment
Check system health status
Analyze the root cause of yesterday's 20:00 service outage
Permission Management
Data Security
Change Risks
Performance Considerations
Q: MCP server connection failure
A: Check MCP server configuration and network connection
Verify authentication information is correct
Review MCP server logs
Q: GitHub operation failure
A: Confirm gh CLI is properly configured
Check repository permissions
Verify gitnexus skill is available
Q: Capsule publishing failure
A: Check EvoMap API connection and node registration
Verify confidence score meets minimum threshold
Ensure asset_id hash is computed correctly
Review EvoMap API response for error details
Q: Incomplete monitoring data
A: Check time range settings
Verify monitoring system is running normally
Confirm query conditions are not too restrictive
Issues and improvement suggestions are welcome!
MIT License