AMG PostgreSQL Flexible Server Health Check

v1.0.0

Run only when the user explicitly asks for a fleet-wide PostgreSQL Flexible Server health check — scans CPU, memory, storage, IOPS, disk bandwidth, and conne...

⭐ 0· 56·0 current·0 all-time

by@1w2w3y

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for 1w2w3y/amg-check-pg-flex.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "AMG PostgreSQL Flexible Server Health Check" (1w2w3y/amg-check-pg-flex) from ClawHub.
Skill page: https://clawhub.ai/1w2w3y/amg-check-pg-flex
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install amg-check-pg-flex

ClawHub CLI

Package manager switcher

npx clawhub@latest install amg-check-pg-flex

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The name/description describe fleet-wide PG Flexible Server health checks and the SKILL.md instructs exactly those queries via mcp__amg__* tools, Azure Monitor KQL, and resource logs. The requirement to register an MCP server named 'amg' (pointing at an Azure Managed Grafana endpoint) is consistent with the stated purpose.

ℹ

Instruction Scope

Instructions stay within the stated scope (discover resources, run Azure Monitor queries, deep-dive abnormal servers, cross-reference a persistent report). The skill reads/writes its own memory files (memory/amg-check-pg-flex/*) and may save large responses to /tmp for local parsing, and it asks to run local interpreters (node/python/jq/pwsh) if available. Those actions are reasonable for heavy telemetry processing but are worth noting because they will cause the agent to write temporary files and execute shell commands when approved by the user.

✓

Install Mechanism

No install spec / no code files — instruction-only. This is the lowest-risk install mechanism. The only setup step is registering an MCP server via the openclaw CLI (user-provided Grafana URL and token); the skill does not download or execute remote archives.

ℹ

Credentials

The skill does not declare environment variables or a primary credential, but it requires the user to register a Grafana service-account token into OpenClaw's MCP configuration (the openclaw mcp set step). That token is necessary and proportional to the task, but it is not surfaced as a declared required credential in the registry metadata — users should be aware they'll need to supply a Grafana token and ensure it has minimal/read-only scope.

✓

Persistence & Privilege

always is false and disable-model-invocation is true (the model cannot autonomously invoke this skill), which reduces risk. The skill persists its own config and reports under memory/amg-check-pg-flex/, which is expected. It does not attempt to modify other skills or global agent settings.

Assessment

This skill appears coherent with its stated purpose, but review these before installing/running: 1) You will need to register an MCP server named exactly 'amg' using openclaw mcp set and provide a Grafana service-account token (glsa_...). Ensure the token is minimal (read-only / Azure Monitor datasource read) and created for this purpose. 2) The skill will write state to memory/amg-check-pg-flex/* and may write large responses to /tmp and invoke local interpreters (node/python/jq/pwsh) when you approve shell access — review those temp files and clear them if they contain sensitive data. 3) Confirm the referenced GitHub repo or publisher independently (the SKILL.md claims an 'Official skill source' but registry source/homepage are missing). 4) Because disable-model-invocation is enabled, the skill will not run autonomously — you must invoke it explicitly. 5) After use, remove the temporary MCP registration or revoke the service account token if you no longer need automated queries. If you want, provide the skill author or registry with an explicit homepage/source before widespread use.

Like a lobster shell, security has layers — review code before you run it.

latestvk9751vyqbp0gs8cs77ty22hgwn851w0b

56downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

OpenClaw Setup (one-time)

This skill calls MCP tools prefixed with mcp__amg__*, so OpenClaw must have an MCP server registered under the exact name amg. Run this once per workspace before invoking the skill:

openclaw mcp set amg '{"url":"https://<your-grafana-instance>/api/azure-mcp","transport":"streamable-http","headers":{"Authorization":"Bearer <your-token>"}}'

Replace <your-grafana-instance> with your Azure Managed Grafana endpoint and <your-token> with a valid Grafana service-account token (starts with glsa_). The server name must be amg — the skill's allowed-tools reference mcp__amg__* and will not find tools under any other name.

Verify the server is registered:

openclaw mcp list

Official skill source: https://github.com/Azure/amg-skills

Runtime Context

Current UTC time: !date -u +%Y-%m-%dT%H:%M:%SZ
Config: !cat memory/amg-check-pg-flex/config.md 2>/dev/null || echo "NOT_CONFIGURED"
Prior report: ![ -f memory/amg-check-pg-flex/report.md ] && echo "exists ($(grep -c '^### BUG-' memory/amg-check-pg-flex/report.md) bugs documented)" || echo "not found"
Arguments: time-range=$0, subscription-override=$1

Known Issues: Before presenting findings, cross-reference results against memory/amg-check-pg-flex/report.md.

AMG PostgreSQL Flexible Server Health Check

Critical Constraints

No subagents for MCP. The Agent tool cannot access MCP tools — all MCP calls must be made from the main context.
Scan every resource. No sampling or early stopping.
Time format: ISO 8601 UTC with explicit from/to — NEVER use timespan (it causes errors).
Parallelism cap: 30 concurrent MCP calls per batch. Reduce to 4-5 if rate-limited.
Result too large: Save to temp file and parse outside the context window. Prefer node -e "..." if installed; otherwise fall back to python -c "...", jq, or pwsh -Command "...". Bash permission for the chosen interpreter will be prompted on first use.

Progress Tracking

Update checkboxes as you complete each phase:

Phase 1a: Datasource validated
Phase 1b: Servers discovered (N=?)
Phase 1c: Non-ready servers investigated (if any)
Phase 2: Pulse check completed (N scanned, N findings)
Phase 3: Deep metrics for abnormal servers
Phase 4: Resource logs for abnormal servers
Report presented
Known issues updated in memory/amg-check-pg-flex/report.md

Configuration

If Config shows NOT_CONFIGURED: Run First-Run Setup at the bottom of this file, then return here.

If Config is populated: Extract the datasource UID and subscription ID from the pre-loaded Runtime Context above and use them for all queries. Use $1 as the subscription override if provided.

Datasource UID: from ## Azure Monitor Datasource > UID
Subscription ID: from ## Subscription (or $1 if provided)
Resource Type: microsoft.dbforpostgresql/flexibleservers (lowercase)
ARM ID template: /subscriptions/{SUB}/resourceGroups/{RG}/providers/Microsoft.DBforPostgreSQL/flexibleServers/{name}

Time Range

Default: 7 days for metrics, 24 hours for logs. Override with $0 (e.g., 3d). Keep log queries to 1-2 days to avoid timeouts.

Workflow

Phase 1a: Validate Datasource

Call amgmcp_datasource_list (no parameters). Find entry with type == "grafana-azure-monitor-datasource".

Matches configured UID → proceed.
Different UID → update memory/amg-check-pg-flex/config.md, warn user, use new UID.
Not found → abort with error.

Phase 1b: Discover All PostgreSQL Flexible Servers

azureMonitorDatasourceUid: {DATASOURCE_UID}
query: |
  resources
  | where type == 'microsoft.dbforpostgresql/flexibleservers'
  | where subscriptionId == '{SUBSCRIPTION_ID}'
  | project name, resourceGroup, location, properties.state, sku.name, sku.tier
  | order by location asc, name asc

If multiple subscriptions are configured, query each separately and merge results. Derive region summary by counting servers per location. Flag servers not in "Ready" state. Stop if zero servers found.

Phase 1c: Activity Log for Non-Ready Servers

If any servers are not in "Ready" state, query the activity log for up to 3 of them:

azureMonitorDatasourceUid: {DATASOURCE_UID}
scope: /subscriptions/{SUB}/resourceGroups/{RG}/providers/Microsoft.DBforPostgreSQL/flexibleServers/{name}
startTime: now-3d
endTime: now
select: eventTimestamp,operationName,status,caller,subStatus

If the response exceeds 500 KB, retry with startTime: now-6h. Summarize: operations performed, caller type (service principal vs human), success/in-progress status, likely cause.

Phase 2: Tier 1 — Fleet-Wide Pulse Check

azureMonitorDatasourceUid: {DATASOURCE_UID}
pastDays: 7
scenarios: pg_flex

Scans all servers across 5 scenarios: pg_flex_cpu, pg_flex_memory, pg_flex_storage, pg_flex_disk_iops, pg_flex_disk_bandwidth.

Before moving to Phase 3, verify:

scanSummary.totalResourcesScanned matches Phase 1 server count.
All 5 scenarios show status: "completed" in scenarioResults.
If errors non-empty, retry affected scenarios individually (e.g., scenarios: pg_flex_cpu).
If >10% servers missing, fall back to batched amgmcp_query_resource_metric for unscanned servers.

Severity thresholds (findings array):

Severity	CPU	Memory	Storage	Disk IOPS	Disk BW
Critical	>90%	>90%	>85%	>90%	>90%
Warning	>80%	>80%	>75%	>80%	>80%

Phase 3: Tier 2 — Deep Metrics for Abnormal Servers

Read reference/phase3-deep-dive.md before starting Phase 3. It contains:

Core metrics table (always query) and secondary metrics table (query on anomaly)
Batch strategy: up to 50 comma-separated resource IDs per call, up to 5 parallel metric calls
Correlation analysis patterns (use ultrathink)

Phase 4: Resource Logs for Abnormal Servers

Read reference/phase4-resource-logs.md before starting Phase 4. It contains:

5 KQL query templates: error distribution, top errors, warning messages, log volume, session duration
Time filter rules (always required to avoid timeouts)

Output

Present the report using the structure in reference/output-format.md.

Classification:

Severity	Criteria
CRITICAL	`is_db_alive` sustained 0, OR CPU max >90%, OR Memory max >90%, OR Storage >85%
WARNING	CPU avg >80%, Memory avg >85%, sustained >60% for 6h+, spike >30pp in 1h, Storage >70%, `connections_failed` >0, `deadlocks` >0, disk IOPS/BW >80%, transactionIDs >1B, longest query >300s, replication delay >30s
HEALTHY	All metrics within normal ranges

Update Known Issues

After presenting findings, update memory/amg-check-pg-flex/report.md:

Read the current file.
Rebuild the Resource Inventory table at the end: every server, full ARM ID, region, SKU, state. Group by region, sorted alphabetically.
Update existing bug status from today's telemetry (resolved / improving / worsening / still active).
Add new bugs with: severity, server name, region, metric evidence, log evidence, root cause, recommended action.
Update the "Updated" date header.

Only add genuine issues: sustained high utilization, crash patterns, connection storms, persistent errors. Skip transient single-hour spikes or expected maintenance windows.

Error Handling

See reference/error-handling.md for the full recovery table.

Analysis Guidance

Known patterns, signals, root causes: reference/analysis-patterns.md
Optional deep-dive KQL queries: reference/deep-dive-queries.md

First-Run Setup

Run only when Config shows NOT_CONFIGURED. After completing, return to the Workflow above.

1. Discover Datasource UID: Call amgmcp_datasource_list. Filter type == "grafana-azure-monitor-datasource". Prefer uid == "azure-monitor-oob" if multiple match. Abort if zero match.

2. Discover Subscription ID: Run this Resource Graph query to list all subscriptions with PostgreSQL Flexible Servers, then present the results as a table and ask the user which subscription(s) to use:

resources
| where type == 'microsoft.dbforpostgresql/flexibleservers'
| join kind=inner (
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions'
    | project subscriptionId, subscriptionName=name
) on subscriptionId
| summarize ServerCount=count() by subscriptionId, subscriptionName
| order by ServerCount desc

Present the results as a table with columns: Subscription Name, Subscription ID, Server Count. Then ask the user: "Which subscription ID(s) should I configure for this health check?"

3. Write config: Write memory/amg-check-pg-flex/config.md:

# amg-check-pg-flex Configuration

User-specific values for the PostgreSQL Flexible Server health check skill.
This file is auto-generated on first run and can be edited manually.

## Azure Monitor Datasource
- **UID**: {discovered_uid}
- **Name**: {discovered_name}

## Subscription
- {subscription_id}

4. Confirm: Show the resolved config and ask for confirmation before proceeding.

Comments

Loading comments...