NeuBird AI Production Ops Agent
v1.0.1Give your assistant production ops superpowers via NeuBird. Use when asked about infrastructure health, cloud costs, incidents, latency, error rates, deploym...
Like a lobster shell, security has layers — review code before you run it.
License
Runtime requirements
SKILL.md
NeuBird Ops Agent
Production ops superpowers, powered by NeuBird — the AI SRE that lives in your terminal.
When to Use
✅ USE this skill when the user asks about anything in production:
| They say... | Use this capability |
|---|---|
| "Any issues right now?" / "Is prod healthy?" | health |
| "Are we wasting money?" / "What's our cloud spend?" | cost |
| "Why is X broken?" / "Any 403s?" / "What's causing errors?" | investigate |
| "Why is the API slow?" / "Find latency outliers" | performance |
| "What could blow up tonight?" / "Any risk on the horizon?" | predict |
| "Give me the full picture" / "Deep health sweep" | deep-dive |
| "What's quietly degrading?" / "Any silent failures?" | silent-failures |
| "Did that deploy break anything?" / "Is this PR risky?" | change-risk |
| "If payments goes down what else dies?" / "Map dependencies" | blast-radius |
❌ DON'T use this skill when:
neubirddesktop binary is not installed — direct user to neubird.ai- The question is about code review, writing code, or pre-deploy checks
- The user wants a dashboard — open the observability platform directly
CLI Interface
# List available projects
neubird projects
# Run a named capability
neubird run <capability> --project <project-name> --session /tmp/
# Free-form investigation (alias for 'run investigate')
neubird investigate "<prompt>" --project <project-name> --session /tmp/
# Follow-up question (project inherited from session)
neubird run <capability> --session /tmp/nb-<timestamp>.json
# Clean up session when done
neubird run --cleanup --session /tmp/nb-<timestamp>.json
All 9 Capabilities
| Capability | CLI name | What it does |
|---|---|---|
| 🏥 Health Check | health | Full infrastructure health sweep |
| 💰 Cost Analysis | cost | Cloud cost baseline + 24h spend projection |
| 🔍 Investigate | investigate | Free-form investigation prompt |
| ⚡ Performance | performance | Find latency outliers and slow queries |
| 🔮 Predict Risk | predict | What could go wrong in the next 24h? |
| 📊 Deep Dive | deep-dive | Full health sweep with 24h lookback |
| 🔬 Silent Failures | silent-failures | Find quietly degrading services |
| 🧬 Change Risk | change-risk | Assess risk from recent deployments and PRs |
| 💥 Blast Radius | blast-radius | Map dependency chains and cascade failure risk |
Session Behavior
--session /tmp/→ auto-generates/tmp/nb-<timestamp>.json, prints path to stderr--session /tmp/nb-123.json→ creates on first call, resumes on follow-ups--projectrequired on first call; inherited from session on follow-ups- Use
--cleanupwhen done to remove the session file
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Complete, findings on stdout |
| 1 | Failed or timed out |
| 2 | Not authenticated — run neubird login |
| 3 | No connected environment / project not found |
Output Format
Investigations stream over 60–180s. Output has two layers:
- Spinner on stderr —
⠋ thinking,⠙ exploring, etc. — ignore - Findings on stdout — narrative markdown, ends with
Completed in XmYs
Use --verbose to see tool calls and MCP server logs during debugging.
Agent Workflow
-
Understand the request — identify which capability fits (see table above); for ambiguous requests default to
investigate -
Determine the project — infer from context ("prod" →
prod_cop, "staging" →staging_auto); if ambiguous runneubird projectsand ask -
Set expectations — tell the user: "Running [capability] against [project] — this takes 1–3 minutes..."
-
Start the run — session path is printed to stderr as
Session: /tmp/nb-<timestamp>.json:For a named capability:
neubird run <capability> --project <project-name> --session /tmp/For a free-form investigation:
neubird investigate "<user prompt>" --project <project-name> --session /tmp/ -
Narrate findings — lead with the bottom line, don't dump raw output:
- State the headline conclusion first
- Summarize key findings with supporting evidence
- Give a concrete recommended action when warranted
- Offer to drill deeper or follow up
-
Follow-up if needed — reference the session path, no
--projectrequired:neubird investigate "<follow-up>" --session /tmp/nb-<timestamp>.json -
Clean up when done:
neubird run --cleanup --session /tmp/nb-<timestamp>.json
Project Names
Common project slugs: prod_cop, staging_auto, dev_cop, prod_cop_sev2.
Run neubird projects to list all available projects with their IDs.
References
Load these when relevant to the findings:
| Topic | File | Load When |
|---|---|---|
| Kubernetes signals | references/kubernetes.md | Pod crashes, node issues, resource exhaustion |
| Cloud infrastructure | references/cloud.md | AWS/GCP/Azure cost, networking, managed services |
| Application & APM | references/application.md | Latency, error rates, traces, deployments |
| Database & storage | references/database.md | Connection pools, slow queries, replication lag |
| Escalation & comms | references/escalation.md | Severity, stakeholder comms, post-incident docs |
Constraints
MUST DO
- Lead every response with the headline conclusion
- State blast radius / scope before recommending action
- Give a concrete next step, not just analysis
- Offer to drill deeper after every finding
- Clean up session files when done
MUST NOT DO
- Dump raw neubird output without narration
- Fabricate findings if the command fails — report the error clearly
- Skip scope/blast radius — "unknown" is valid but must be stated
- Recommend rollback without checking if a recent deploy is in scope
Files
6 totalComments
Loading comments…
