NeuBird AI Production Ops Agent

Other

Give your assistant production ops superpowers via NeuBird. The neubird cli allows this sill to answer questions about infrastructure health, cloud costs, incidents, latency, error rates, deployment risk, silent failures, blast radius, or anything happening in production right now. Covers all 9 NeuBird capabilities: health check, cost analysis, investigation, performance, risk prediction, deep dive, silent failures, change risk, and blast radius. Requires the neubird CLI to be installed and authenticated.

Install

openclaw skills install production-operations-agent

NeuBird Ops Agent

Production ops superpowers, powered by NeuBird — the AI SRE that lives in your terminal.

When to Use

USE this skill when the user asks about anything in production:

They say...Use this capability
"Any issues right now?" / "Is prod healthy?"health
"Are we wasting money?" / "What's our cloud spend?"cost
"Why is X broken?" / "Any 403s?" / "What's causing errors?"investigate
"Why is the API slow?" / "Find latency outliers"performance
"What could blow up tonight?" / "Any risk on the horizon?"predict
"Give me the full picture" / "Deep health sweep"deep-dive
"What's quietly degrading?" / "Any silent failures?"silent-failures
"Did that deploy break anything?" / "Is this PR risky?"change-risk
"If payments goes down what else dies?" / "Map dependencies"blast-radius

DON'T use this skill when:

  • neubird desktop binary is not installed — direct user to neubird.ai
  • The question is about code review, writing code, or pre-deploy checks
  • The user wants a dashboard — open the observability platform directly

CLI Interface

# List available projects
neubird projects

# Run a named capability
neubird run <capability> --project <project-name> --session /tmp/

# Free-form investigation (alias for 'run investigate')
neubird investigate "<prompt>" --project <project-name> --session /tmp/

# Follow-up question (project inherited from session)
neubird run <capability> --session /tmp/nb-<timestamp>.json

# Clean up session when done
neubird run --cleanup --session /tmp/nb-<timestamp>.json

All 9 Capabilities

CapabilityCLI nameWhat it does
🏥 Health CheckhealthFull infrastructure health sweep
💰 Cost AnalysiscostCloud cost baseline + 24h spend projection
🔍 InvestigateinvestigateFree-form investigation prompt
⚡ PerformanceperformanceFind latency outliers and slow queries
🔮 Predict RiskpredictWhat could go wrong in the next 24h?
📊 Deep Divedeep-diveFull health sweep with 24h lookback
🔬 Silent Failuressilent-failuresFind quietly degrading services
🧬 Change Riskchange-riskAssess risk from recent deployments and PRs
💥 Blast Radiusblast-radiusMap dependency chains and cascade failure risk

Session Behavior

  • --session /tmp/ → auto-generates /tmp/nb-<timestamp>.json, prints path to stderr
  • --session /tmp/nb-123.json → creates on first call, resumes on follow-ups
  • --project required on first call; inherited from session on follow-ups
  • Use --cleanup when done to remove the session file

Exit Codes

CodeMeaning
0Complete, findings on stdout
1Failed or timed out
2Not authenticated — run neubird login
3No connected environment / project not found

Output Format

Investigations stream over 60–180s. Output has two layers:

  • Spinner on stderr⠋ thinking, ⠙ exploring, etc. — ignore
  • Findings on stdout — narrative markdown, ends with Completed in XmYs

Use --verbose to see tool calls and MCP server logs during debugging.

Agent Workflow

  1. Understand the request — identify which capability fits (see table above); for ambiguous requests default to investigate

  2. Determine the project — infer from context ("prod" → prod_cop, "staging" → staging_auto); if ambiguous run neubird projects and ask

  3. Set expectations — tell the user: "Running [capability] against [project] — this takes 1–3 minutes..."

  4. Start the run — session path is printed to stderr as Session: /tmp/nb-<timestamp>.json:

    For a named capability:

    neubird run <capability> --project <project-name> --session /tmp/
    

    For a free-form investigation:

    neubird investigate "<user prompt>" --project <project-name> --session /tmp/
    
  5. Narrate findings — lead with the bottom line, don't dump raw output:

    • State the headline conclusion first
    • Summarize key findings with supporting evidence
    • Give a concrete recommended action when warranted
    • Offer to drill deeper or follow up
  6. Follow-up if needed — reference the session path, no --project required:

    neubird investigate "<follow-up>" --session /tmp/nb-<timestamp>.json
    
  7. Clean up when done:

    neubird run --cleanup --session /tmp/nb-<timestamp>.json
    

Project Names

Common project slugs: prod_cop, staging_auto, dev_cop, prod_cop_sev2. Run neubird projects to list all available projects with their IDs.

References

Load these when relevant to the findings:

TopicFileLoad When
Kubernetes signalsreferences/kubernetes.mdPod crashes, node issues, resource exhaustion
Cloud infrastructurereferences/cloud.mdAWS/GCP/Azure cost, networking, managed services
Application & APMreferences/application.mdLatency, error rates, traces, deployments
Database & storagereferences/database.mdConnection pools, slow queries, replication lag
Escalation & commsreferences/escalation.mdSeverity, stakeholder comms, post-incident docs

Constraints

MUST DO

  • Lead every response with the headline conclusion
  • State blast radius / scope before recommending action
  • Give a concrete next step, not just analysis
  • Offer to drill deeper after every finding
  • Clean up session files when done

MUST NOT DO

  • Dump raw neubird output without narration
  • Fabricate findings if the command fails — report the error clearly
  • Skip scope/blast radius — "unknown" is valid but must be stated
  • Recommend rollback without checking if a recent deploy is in scope