Install
openclaw skills install olares-dashboardolares-cli dashboard command tree — AI-agent-oriented mirror of the dashboard SPA's Overview2 + Applications2 routes. Covers: the strict dual-shape JSON enve...
openclaw skills install olares-dashboardCRITICAL — before doing ANYTHING in this subtree, MUST Read ../olares-shared/SKILL.md for profile selection, login, factory-injected *http.Client, and HTTP 401/403 recovery rules. Every dashboard verb depends on that foundation.
This file is the source of truth for the dashboard subtree. Any code generation, refactor or fix touching cli/cmd/ctl/dashboard/**, cli/pkg/dashboard/**, cli/pkg/dashboard/format/**, cli/pkg/olares/id.go::DashboardURL, or cli/pkg/credential/{types,default_provider}.go::DashboardURL MUST first Read this file end-to-end and respect the Iteration red-lines section. Do not "modernize", "simplify" or "consolidate" anything listed under Frozen modules without an explicit user-approved plan that supersedes this skill.
The dashboard CLI mirrors only the SPA routes that are still wired in packages/app/src/apps/dashboard/router/routes.ts:
Overview2/IndexPage.vue (overview tree) + the per-detail-page subtrees:
Overview2/CPU, Memory, Disk, Pods, Network, Fan, GPU.Applications2/IndexPage.vue (applications tree).Everything else in the SPA (legacy Overview/, audit, settings, …) is out of scope. Do not add commands for deprecated routes; reject any request that asks for them with a pointer to the route file.
The dashboard CLI is a two-layer split with mirrored subpackage trees on BOTH sides:
cli/cmd/ctl/dashboard/ — one directory per
command-tree parent, one .go file per leaf. Each leaf's RunE is a
thin validate flags → prepareClient → pkg<area>.RunXxx(...) body.
Per-area common.go is ≤ 70 lines and carries only var common,
prepareClient, unknownSubcommandRunE.cli/pkg/dashboard/, split in two:
cli/pkg/dashboard/*.go) owns infrastructure
and shared fetchers — Client, Runner, CommonFlags,
Envelope/Item/Meta, Fetch* helpers, BuildRankingEnvelope,
capability gates, the format pkg, the schema bundle. No
command-specific envelope builders or table writers live here.cli/pkg/dashboard/applications/,
cli/pkg/dashboard/overview/,
cli/pkg/dashboard/overview/{disk,fan,gpu}/) mirror the cmd
parent dirs. Each carries one <verb>.go file per leaf, exposing
a single RunXxx(ctx, c, cf, ...) entry plus its
BuildXxxEnvelope / WriteXxxTable triplet. Tests sit
alongside as <verb>_test.go (one shared helpers_test.go per
area for httptest fixtures).Every fetch / merge / aggregate / format call goes through pkg —
either the root infra or, for envelope assembly, the matching pkg
subpackage. The cmd shell never touches HTTP, never builds an
envelope, never writes a table, never runs Runner directly.
flowchart LR
User[olares-cli dashboard <verb>] --> Factory[cmdutil.Factory<br/>refreshingTransport]
Factory --> CmdShell["cli/cmd/ctl/dashboard/<area>/<leaf>.go<br/>(thin cobra shell:<br/>validate → prepareClient →<br/>pkg<area>.RunXxx)"]
CmdShell --> PkgArea["cli/pkg/dashboard/<area>/<leaf>.go<br/>(RunXxx + BuildXxxEnvelope +<br/>WriteXxxTable + watch loop)"]
PkgArea --> PkgRoot["cli/pkg/dashboard/<br/>(infra: Envelope, Client, Runner,<br/>fetchers, aggregators, gates, schema)"]
PkgRoot -->|/capi/* + /kapis/* + /hami/*| BFF["dashboard.<terminus><br/>(ks-apiserver-proxy)"]
BFF -->|wildcard proxy| KsAPI[ks-apiserver]
PkgRoot --> Format["cli/pkg/dashboard/format<br/>(1:1 JS port)"]
PkgArea --> Output["WriteJSON / WriteTable<br/>(pkgdashboard.*)"]
Output --> User
cmd area ↔ pkg subpackage mapping — every cmd leaf is a thin
prepareClient + pkg<area>.RunXxx(...) shell. The heavy logic
(envelope builders, table writers, fan-out, watch loops) lives in a
mirror subpackage under cli/pkg/dashboard/<area>/. The pkg root
(cli/pkg/dashboard/) only owns infrastructure and shared fetchers.
| cmd area | pkg subpackage entry points (Run*) | shared infra it pulls from cli/pkg/dashboard/ |
|---|---|---|
cmd/ctl/dashboard/root.go + options.go | n/a — pure cobra wiring + bindPersistentFlags | flags.go (CommonFlags), runner.go, emit.go, output.go |
cmd/ctl/dashboard/applications/root.go | pkg/dashboard/applications/list.go → RunList | ranking.go, workloads.go, apps.go, format/... |
cmd/ctl/dashboard/overview/{physical,user,ranking,root}.go | pkg/dashboard/overview/{physical,user,ranking,default}.go → RunPhysical/RunUser/RunRanking/RunDefault | monitoring.go, workloads.go, ranking.go, users.go, numbers.go |
cmd/ctl/dashboard/overview/{cpu,memory,pods,network}.go | pkg/dashboard/overview/{cpu,memory,pods,network}.go → RunCPU/RunMemory/RunPods/RunNetwork (built on nodes.go's RunPerNodeMetric scaffold) | monitoring.go, system.go (network → capi /system/ifs), numbers.go |
cmd/ctl/dashboard/overview/disk/{root,main,partitions}.go | pkg/dashboard/overview/disk/{default,main,partitions}.go → RunDefault/RunMain/RunPartitions | monitoring.go (node_disk_*), lsblk.go, numbers.go, format/... |
cmd/ctl/dashboard/overview/fan/{root,live,curve}.go | pkg/dashboard/overview/fan/{default,live,curve}.go → RunDefault/RunLive/RunCurve | system.go (FetchSystemFan), gates.go (GateOlaresOne), gpu.go (graphics list co-render), fan_curve.go |
cmd/ctl/dashboard/overview/gpu/{root,list,tasks,get,task,detail,task_detail}.go | pkg/dashboard/overview/gpu/{list,tasks,get,task,detail,task_detail}.go → RunList/RunTasks/RunGet/RunTask/RunDetail/RunTaskDetail (+ specs.go for the gauge/trend catalogue and concurrent fan-out) | gpu.go, gpu_format.go, gpu_query.go, gates.go, client.go |
cmd/ctl/dashboard/schema/root.go | n/a — schema loader stays in pkg root | schema.go (loader + go:embed bundle) |
ResolvedProfile.DashboardURL = https://dashboard.<localPrefix><terminusName> (derived in cli/pkg/olares/id.go DashboardURL).refreshingTransport — header injection + 401 retry happen for free; dashboard code MUST NOT touch X-Authorization or instantiate its own http.Client.*pkgdashboard.CommonFlags is bound once in cmd-root (bindPersistentFlags(&common, cmd)), then passed by pointer to every area factory (overview.NewOverviewCommand(f, &common) etc.). Each area subpackage stores it in a package-level var common *pkgdashboard.CommonFlags so leaf RunE bodies keep common.Output / common.Validate() / common.Timezone selectors.Every dashboard command emits exactly one of two shapes; the choice is fixed per command and MUST NOT be changed.
Used by every command except parent commands with a sections envelope (see Shape B).
{
"kind": "dashboard.<area>.<verb>",
"meta": { "fetched_at": "...", "iteration": 0, "recommended_poll_seconds": 60, "empty": false, "empty_reason": "", "error": "", "http_status": 200 },
"items": [ { "raw": { /* upstream-shape, units in source unit */ }, "display": { /* table-friendly strings */ } } ]
}
raw is the canonical machine-friendly shape: numbers as numbers, timestamps as Unix seconds, temperatures as raw Celsius (NOT converted by --temp-unit). Agents pin on raw.display is human-presentation only; values are formatted via the format pkg with current --temp-unit / --timezone. Agents MUST NOT pin on display — it can change with locale / format-pkg fixes.meta.recommended_poll_seconds — the page-level polling cadence the SPA uses; agents driving --watch SHOULD respect it.meta.iteration — present in every --watch payload, 1-based.meta.empty / meta.empty_reason — three-state empty data; see Three-state empty data.meta.error — only set on a failed --watch iteration or per-section failure inside Shape B.Used by parent commands that aggregate multiple sub-views:
| Parent command | Sections | Section kinds |
|---|---|---|
dashboard overview | physical / user / ranking | dashboard.overview.physical / .user / .ranking |
dashboard overview disk | main / partitions | dashboard.overview.disk.main / .partitions (the latter is itself a sections envelope, keyed by device) |
dashboard overview fan | live / curve | dashboard.overview.fan.live / .curve |
dashboard schema | n/a — emits Shape A index | dashboard.schema.index |
{
"kind": "dashboard.overview",
"meta": { ... },
"sections": {
"physical": { "kind": "dashboard.overview.physical", "meta": {...}, "items": [...] },
"user": { "kind": "dashboard.overview.user", "meta": {...}, "items": [...] },
"ranking": { "kind": "dashboard.overview.ranking", "meta": {...}, "items": [...] }
}
}
meta.error="..." on that section, the others still return.Declared in cli/pkg/dashboard/schema.go. AllKinds() MUST stay 1:1 with the actual command surface. Adding a command means:
KindXxx constant in pkg/dashboard/schema.go.AllKinds() (same file).cli/pkg/dashboard/schemas/<kind-without-namespace>.json.LoadSchemaIndex() (same file).cmd/ctl/dashboard/<area>/ subdirectory and AddCommand it from the area's root.go.Do NOT rename existing Kind values — agents rely on string equality.
The authoritative listing of every command, its envelope shape, and
its default-action semantics is the cmd/pkg directory tree itself
(see File layout) and olares-cli dashboard --help. Run olares-cli dashboard schema -o json for the live dashboard.<area>.<verb> Kind index.
The following are load-bearing. Touching them requires a separate user-confirmed plan; otherwise reject the request with this list.
| Frozen module | Why it's frozen | Where |
|---|---|---|
Two-layer split with mirrored cmd/<area> ↔ pkg/dashboard/<area> subpackages: cmd is a thin cobra shell that calls pkg<area>.RunXxx(...); pkg owns every envelope builder, table writer, watch loop and fan-out. The pkg root holds only infrastructure + shared fetchers. | Architectural decision (P1–P7); each leaf's heavy logic is testable directly via RunXxx without dragging cobra in, and cmd-side line budgets stay flat as the surface grows | cli/cmd/ctl/dashboard/ + cli/pkg/dashboard/ |
| Directory tree mirrors command tree on BOTH sides | One file per leaf in cmd (<verb>.go) AND in the matching pkg subpackage (<verb>.go exposing one RunXxx entry); root.go+common.go per cmd parent, default.go per pkg parent — prevents drift between dashboard --help, the cmd ls, and the pkg ls | every cmd subdirectory under cli/cmd/ctl/dashboard/ and every pkg subpackage under cli/pkg/dashboard/<area>/ |
Kind constants + AllKinds() | Public agent contract | pkg/dashboard/schema.go |
| Envelope shapes A / B | Public agent contract | pkg/dashboard/output.go |
format pkg signatures + JS-parity behaviour | Backed by golden.json oracle; even rounding quirks are intentional | pkg/dashboard/format/format.go |
FetchClusterMetrics / FetchNodeMetrics / FetchUserMetric shared fetchers | Single upstream call powers multiple commands; must not fork | pkg/dashboard/monitoring.go |
FetchWorkloadsMetrics dual-fetch + merge + BuildRankingEnvelope | Single pkg function powers overview ranking + applications first-row parity; keep the dual-fetch shape | pkg/dashboard/workloads.go, pkg/dashboard/ranking.go |
FetchSystemIFS / FetchSystemFan / graphics+task helpers | Wire shape pinned in tests; capi vs hami URL prefixes are part of the contract | pkg/dashboard/system.go, pkg/dashboard/gpu.go |
FanCurveTable + FanSpeedMaxCPU + FanSpeedMaxGPU constants | 1:1 with SPA Fan/config.ts.tableData; CLI ships them as Go constants because the SPA does too | pkg/dashboard/fan_curve.go |
| Parent-command default = sections envelope (overview, overview disk, overview fan) | Plan decision; consumers branch on kind == "dashboard.overview.*" to know whether to expect sections or items | overview/root.go, overview/disk/root.go, overview/fan/root.go |
Runner + --watch semantics (interval / iterations / timeout / 3-consecutive-failure exit / NDJSON per-iteration / SIGINT graceful / ErrTokenInvalidated short-circuit) | Plan decision; agents script around it | pkg/dashboard/runner.go |
--since vs --start --end mutual-exclusion + sliding-window rules | Plan decision (since_slides_abs_repeats) | pkg/dashboard/flags.go Validate / ResolveWindow |
| Three-state empty-data semantics | Distinguishes "feature not installed" from "no hardware" without throwing | overview gpu {list,tasks} / overview fan live (cmd) ↔ pkg/dashboard/gates.go |
Client.EnsureUser sync.Once cache + RequireAdmin / ResolveTargetUser guards | Auth correctness for --user / overview user <name> | pkg/dashboard/client.go |
ResolvedProfile.DashboardURL derivation chain | URL is derived, not configured | cli/pkg/olares/id.go, cli/pkg/credential/types.go, cli/pkg/credential/default_provider.go |
Cobra SilenceUsage = true + SilenceErrors = true on every dashboard cmd | Clean machine-readable error text; pinned by TestAllLeafCommandsSilenced in cmd-root | every cobra.Command literal under cmd/ctl/dashboard/ |
Default output = table | Plan decision (table_default) | pkg/dashboard/output.go ParseOutputFormat |
Dependency direction (cmd side) — cli/cmd/ctl/dashboard/<area> may import cli/pkg/dashboard, cli/pkg/dashboard/<area>/..., and its own cobra children only; horizontal cmd↔cmd imports are FORBIDDEN. The cmd shell is forbidden from owning envelope shapes / table writers / fan-out — those belong in the matching pkg/dashboard/<area> subpackage. | Architectural integrity; prevents the circular-import class of bugs that killed the 1.x layout, and keeps the cmd surface trivially line-budgeted | every import block under cli/cmd/ctl/dashboard/ |
Dependency direction (pkg side) — cli/pkg/dashboard/<area>/... may import cli/pkg/dashboard (the root infra: Client, Runner, fetchers, format, numbers, gpu_format, gpu_query, gates, schema, output, emit, lsblk, fan_curve); the pkg root MUST NOT import any subpackage (no upward import). Sibling subpackages (e.g. overview/disk ↔ overview/fan) MUST NOT import each other — if two areas need the same heavy helper, hoist it to cli/pkg/dashboard/ (the BuildRankingEnvelope precedent). | Keeps the pkg root independently testable + reusable, and keeps each <area> subpackage a self-contained mirror of its cmd sibling | every import block under cli/pkg/dashboard/ |
Optional hardware (GPU / fan) and optional integrations have three legitimate "empty" states; the CLI distinguishes them in meta.empty_reason so agents can branch without parsing prose.
| Upstream response | meta.empty | meta.empty_reason | Example |
|---|---|---|---|
| HTTP 404 | true | no_<feature>_integration | HAMI vGPU not installed → dashboard overview gpu list |
| HTTP 200, empty body / empty metric | true | no_<feature>_detected | Server has no fans → dashboard overview fan live |
| HTTP 200, non-empty | false | "" | Normal case |
| Any other 4xx/5xx | n/a (envelope meta.error) | n/a | Real failure → propagate via watch error path |
Forbidden: turning a 404 into an error, or merging the two reasons into a single "empty=true" — agents currently key on the reason.
The fan / gpu subtrees mirror the SPA's gates — but with two different strengths because the SPA itself treats them differently:
| Subtree | Gate strength | Source check |
|---|---|---|
overview fan * (default / live / curve) | Hard — empty envelope before any fetch | Fan.ts:67 (isOlaresOneDevice); fan data is meaningless on non-Olares One hardware |
overview gpu * (list / tasks / get / task) | Soft — advisory meta.note + stderr hint, then proceed with the HAMI fetch | ClusterResource.vue:232+278 hides the sidebar entry for non-admins / no-CUDA clusters; the GPU detail pages themselves do not hard-block |
{
"kind": "dashboard.overview.fan.live",
"meta": {
"empty": true,
"empty_reason": "not_olares_one",
"note": "Fan / cooling integration is only available on Olares One devices",
"device_name": "DIY-PC",
"fetched_at": "..."
},
"items": []
}
The CLI ALWAYS calls HAMI for GPU verbs. The SPA-equivalent "would-have-been-hidden" reason is recorded as meta.note and (in table mode) as a one-liner on stderr. The actual meta.empty / empty_reason are determined by HAMI's response:
// non-admin caller, HAMI returned an empty list:
{
"kind": "dashboard.overview.gpu.list",
"meta": {
"empty": true,
"empty_reason": "no_gpu_detected",
"note": "GPU sidebar entry is hidden for non-admin profiles in the SPA; HAMI was queried directly",
...
},
"items": []
}
// any caller, HAMI returned 5xx:
{
"kind": "dashboard.overview.gpu.list",
"meta": {
"empty": true,
"empty_reason": "vgpu_unavailable",
"http_status": 500,
"error": "unknown request error",
"note": "HAMI vGPU controller responded with Internal Server Error; the integration is installed but unhealthy",
...
},
"items": []
}
empty_reason taxonomy:
| Reason | Where | Meaning |
|---|---|---|
not_olares_one | fan default / live / curve (hard gate) | Active device's device_name is not Olares One |
no_fan_integration | fan live (HTTP 404 fallback) | /user-service/api/mdns/olares-one/cpu-gpu returned 404 |
no_vgpu_integration | gpu list / tasks / get / task (HTTP 404) | HAMI vGPU controller absent |
vgpu_unavailable | gpu list / tasks / get / task (HTTP 5xx) | HAMI installed but unhealthy. meta.error carries the upstream message; meta.http_status carries the original status |
no_gpu_detected | gpu list / tasks / get / task (HTTP 200, empty body) | HAMI healthy but device list / detail is empty |
Soft-gate advisories never appear as empty_reason; they live in meta.note:
| Note | When |
|---|---|
GPU sidebar entry is hidden for non-admin profiles in the SPA; HAMI was queried directly | meta.user.globalrole != platform-admin (and not admin) |
no node carries gpu.bytetrade.io/cuda-supported=true; SPA hides the GPU card. HAMI was queried directly | Admin caller, but the cluster's node list has no CUDA-supported label |
When BOTH a HAMI failure AND a soft advisory apply, meta.note is the concatenation "<advisory> | <hami-explanation>".
HAMI's WebUI returns the payload at the top level for every endpoint we call — there is no data envelope around it. Adding one in the decoder yields a silent "0 GPUs" / "0 tasks" failure even on machines where the SPA shows devices.
| Endpoint | Real wire shape | Decoder target |
|---|---|---|
POST /hami/api/vgpu/v1/gpus | {"list":[ {graphics...} ]} | struct{ List []map[string]any }, return raw.List |
POST /hami/api/vgpu/v1/containers | {"items":[ {task...} ]} | struct{ Items []map[string]any }, return raw.Items |
GET /hami/api/vgpu/v1/gpu?uuid=… | { ...graphics fields... } | map[string]any, return verbatim |
GET /hami/api/vgpu/v1/container?name=…&podUid=… | { ...task fields... } | map[string]any, return verbatim |
The fetchers in pkg/dashboard/gpu.go are pinned to this shape; TestFetchGraphicsList_ParsesTopLevelList / TestFetchTaskList_ParsesTopLevelItems / TestFetchGraphicsDetail_ReturnsBodyAsIs / TestFetchTaskDetail_ReturnsBodyAsIs enforce the contract (in cli/pkg/dashboard/dashboard_test.go).
GPU/task field names that round-trip from HAMI (used in Raw envelopes — agents can pull anything not surfaced in the table):
gpu list raw fields | gpu tasks raw fields |
|---|---|
uuid, type, shareMode, nodeName, nodeUid, health (bool), coreUtilizedPercent (already a percent — NOT a 0..1 ratio), coreUsed, coreTotal, memoryUsed / memoryUtilized / memoryTotal (all MiB), memoryUtilizedPercent, vgpuUsed, vgpuTotal, power, powerLimit (W), temperature (°C), node, mode | name, status, podUid, nodeName, nodeUid, type, deviceShareModes[], devicesCoreUtilizedPercent[], devicesMemUtilized[], allocatedDevices, allocatedCores, allocatedMem, createTime, startTime, endTime, namespace, deviceIds[], resourcePool, flavor, priority, appName |
Common renames the SPA does NOT do (we don't either — earlier revisions guessed at e.g. modelName / hostname / usedMem / totalMem / coreUtilization / sharemode; none of those field names appear on the wire):
type (yes, the field is literally called type)nodeNamecoreUtilizedPercent (already a percent)memoryTotal / memoryUsed (both MiB; multiply by 1024² before passing to format.GetDiskSize)shareMode (string "0"/"1"/"2"/"3"; SPA's VRAMModeLabel enum only knows 0/1/2 — the CLI surfaces unknown values as mode=<raw>)The wire-shape contract above ONLY covers the list/detail endpoints. The two monitor query endpoints behave the opposite way — they DO wrap the result in a single-level data envelope, matching the SPA's InstantVectorResponse / RangeVectorResponse types in packages/app/src/apps/dashboard/types/gpu.ts.
| Endpoint | Wire shape | Decoder |
|---|---|---|
POST /hami/api/vgpu/v1/monitor/query/instant-vector | {"data":[ {metric, value, timestamp} ]} | struct{ Data []instantVectorSample }, return raw.Data |
POST /hami/api/vgpu/v1/monitor/query/range-vector | {"data":[ {metric, values:[{value, timestamp}]} ]} | struct{ Data []rangeVectorSeries }, return raw.Data |
Both fetchers (fetchInstantVector / fetchRangeVector) are pinned by TestFetchInstantVector_ParsesDataEnvelope / TestFetchRangeVector_ParsesDataEnvelope. Do NOT "harmonise" with the list endpoints — different microservices behind HAMI, different wire shapes, period.
Range body shape: {"query":"<promql>","range":{"start":"YYYY-MM-DD HH:mm:ss","end":"…","step":"30m"}}. start/end are SPA's timeParse(date) (local clock, no offset suffix); the step is computed by gpuTrendStep(start, end) — a 1:1 port of timeRangeFormate(diff_s, 16) which preserves the SPA's preset table (10m→1m, 60m→10m, 480m→30m, 1440m→60m, etc.) and falls back to floor(minutes/16)m capped at [1m..60m].
dashboard overview gpu detail <uuid> / task-detail <name> <pod-uid> — full detail pagesTwo new commands mirror the SPA's Overview2/GPU/GPUsDetails.vue and Overview2/GPU/TasksDetails.vue pages: basic info (HAMI /v1/gpu or /v1/container) + top gauges (instant-vector) + trend charts (range-vector), all assembled into a single sections envelope.
| Command | Kind | Default window | Sections | PromQL count |
|---|---|---|---|---|
overview gpu detail <uuid> | dashboard.overview.gpu.detail.full | 8h | detail / gauges / trends | 4 instant + 6 range |
overview gpu task-detail <name> <pod-uid> [--sharemode] | dashboard.overview.gpu.task.detail.full | 1h | detail / gauges / trends | 2 instant + 2 range |
Section item ordering is fixed and stable across releases — agents can index by position OR by raw.key:
GPU detail.full gauges keys (in order): alloc_core / alloc_mem / util_core / util_mem / power / temperature.
GPU detail.full trends keys (in order): alloc_trend (lines: core+memory) / usage_trend (core+memory) / power_trend / temp_trend.
Task task-detail.full gauges keys (in order): compute_usage / vram_usage.
Task task-detail.full trends keys (in order): compute_trend / vram_trend (each one line, label usage).
Time window flags reuse the existing --since / --start / --end. With neither set the SPA defaults apply (8h for GPU detail, 1h for task detail). meta.window carries {since, start, end, step} so an agent can replay the exact same query without recomputing.
--sharemode on task-detail mirrors the SPA's displayAllocation toggle: when set to "2" (Time slicing) the CLI skips the allocation gauges, matching the SPA's hidden-rows behaviour. Other values ("0", "1") keep the full gauge set. Pinned by TestBuildGPUTaskDetailFullEnvelope_TimeSlicingSkipsAllocation.
Soft-failure semantics — a single instant/range query failing does NOT abort the envelope:
raw.error (and display.error in the table);meta.warnings collects gauges "<key>": <err> / trends "<key>": <err> lines;len(meta.warnings) > 0 to detect partial data;TestBuildGPUDetailFullEnvelope_PartialFailure.Hard-failure semantics still apply on the detail fetch itself: HAMI 404 → empty_reason=no_vgpu_integration, 5xx → vgpu_unavailable (meta.error + meta.http_status), 200 with empty body → no_gpu_detected. Watch mode polls every 30s by default (Recommended = 30 * time.Second — the monitor endpoints are slower than plain Prom queries).
device_no / driver_version are surfaced on the detail item — the CLI mirrors the SPA's trick of harvesting them from the power_trend range query's metric labels (HAMI doesn't ship them on /v1/gpu).
Behavior contract:
0 for every gated / advisory / empty path — these are predictable states, not failures. Real failures still propagate via meta.error (or non-zero exit for top-level fatal errors).fan is only available on Olares One devices (current: <device_name>)(advisory) GPU sidebar entry is hidden for non-admin profiles in the SPA; current user (<u>) is <role>(advisory) no node carries gpu.bytetrade.io/cuda-supported=true; SPA hides the GPU card. HAMI was queried directlygpu data temporarily unavailable: HAMI returned HTTP 500 (unknown request error)/user-service/api/system/status, /kapis/.../nodes, /capi/app/detail) are cached per-Client (sync.Once for system status; per-Client map for CUDA presence; sync.Once for user detail) — repeated calls inside an aggregated dashboard overview cost zero extra HTTP.Runner.Iter for --watch so a stream against the wrong device terminates with consistent empty envelopes per tick. Soft advisories likewise re-evaluate per iteration.Agent decision tree (recommended):
inspect meta.empty + meta.empty_reason + meta.note →
not_olares_one → skip the fan subtree on this device
no_*_integration → upstream component absent (HAMI / capi system fan)
vgpu_unavailable → transient: retry; check meta.http_status / meta.error
no_*_detected → integration up but hardware empty
(none) + meta.note → data is present but the SPA would have hidden the entry; surface the note to the user
(none) + (no note) → items[] populated, proceed normally
Forbidden:
meta.error-only — error is reserved for the upstream message; the canonical signal is empty_reason.vgpu_unavailable as a hard error / non-zero exit — agents must keep iterating in --watch.--watch semantics (frozen)Default behaviour: single execution. --watch opts into HTTP-polling that mirrors the SPA's setTimeout cadence.
flowchart TD
A[invoke leaf cmd] --> B{--watch set?}
B -- no --> C[RunOnce → emit one envelope → exit 0/1]
B -- yes --> D[Runner.Run]
D --> E[RunOnce iteration N]
E --> F{success?}
F -- yes --> G[emit envelope NDJSON line<br/>meta.iteration=N]
F -- transient err --> H[emit envelope NDJSON<br/>meta.error set, items:[]]
F -- ErrTokenInvalidated / ErrNotLoggedIn --> X[exit non-zero immediately]
G --> I{--watch-iterations cap reached?}
H --> J{N consecutive errors >= 3?}
J -- yes --> X
I -- yes --> Y[exit 0]
J -- no --> K[wait --watch-interval]
I -- no --> K
K --> L{SIGINT?}
L -- yes --> Y
L -- no --> E
Hard rules:
-o json is for one-shot mode; --watch -o json is always NDJSON.items:[] + meta.error="<msg>"; never raise to process exit unless it's an auth-class error or the 3-consecutive cap trips.--since N slides forward each iteration (window = [now-N, now]); --start/--end stays fixed across all iterations. Mutual exclusion is enforced in CommonFlags.Validate; do NOT remove the check.--watch-iterations without --watch is rejected. Same for --watch-timeout / --watch-interval.Runner exposes Iter (the per-iteration callback). It used to be called Run and was renamed to dodge a field/method-name clash with Runner.Run() — leave it.--user flagClient.EnsureUser(ctx) lazily fetches /capi/app/detail (once per process) and caches globalrole. Subsequent calls return the cached value.RequireAdmin(ctx) is the only admit gate for --user <olaresId>; admin-only verbs (cross-tenant queries) MUST call it before issuing any upstream request.resolveTargetUser(ctx, c, requested) is the helper for commands that accept a user via positional arg OR --user (today: overview user [<username>]). Empty / self → returns the active profile; explicit + admin → returns synthetic UserDetail{Name: requested}; explicit + non-admin → error mentioning "platform-admin".--user only changes upstream filter args; auth header is still the active profile's token.The format pkg is a byte-for-byte port of:
@bytetrade/core/src/monitoring.ts → UnitTypes, GetValueByUnit, GetSuitableUnit, GetSuitableValueapps/packages/app/src/apps/dashboard/utils/ → WorthValue, FormatFrequency, ConvertTemperature, GetDiskSize, GetThroughput, FormatTime, GetMinuteValue, GetLastMonitoringData, GetResultRules:
format/testdata/golden-gen.js runs the JS oracle directly and writes golden.json; TestFormat_GoldenOracle asserts byte equality. Any divergence is a Go bug, not a JS bug.github.com/shopspring/decimal for arithmetic that the JS uses BigNumber for (e.g. WorthValue's cascading thresholds, FormatFrequency rounding). Do not introduce a different decimal lib.toFixed(2) of 1.005 returns "1.00" because of float repr — the Go port replicates that. Test cases must use unambiguous values like 1.234567 → 1.23.WorthValue uses cascading if thresholds, not generic ladder logic — keep the structure.fmt.Sprintf("%.2f", x) in overview.go / applications.go for a value that is shown to the user (the few %.2f left in overview.go are pinned to non-user-facing CPU core counts and are documented inline).raw carries Celsius (the upstream value); display is rendered with --temp-unit (default Celsius). Never strip Celsius from raw.All shared infrastructure helpers live under
cli/pkg/dashboard/ (the pkg root) and are
imported as pkgdashboard "github.com/beclab/Olares/cli/pkg/dashboard"
by the pkg subpackages (pkg/dashboard/<area>/...) and — for the
plumbing types like CommonFlags / Client / ErrAlreadyReported —
also by the cmd shells. The cmd shells additionally import
pkg<area> "github.com/beclab/Olares/cli/pkg/dashboard/<area>" to
reach RunXxx.
pkg/dashboard/ (root) API surface — infrastructure + shared fetchers| Domain file | Exposed surface | Used by |
|---|---|---|
flags.go | CommonFlags struct (raw + parsed fields), Validate, ResolveWindow, OutputFormat, ParseOutputFormat | every cmd leaf RunE (validate), every pkg RunXxx (read fields), every fetcher; cobra binding lives in cmd-root options.go |
output.go / emit.go / envelope.go | Envelope / Item / Meta / TableColumn, WriteJSON (NDJSON-aware), WriteTable, HeadItems, DisplayString, EmitDefault, NewMeta | every BuildXxxEnvelope / WriteXxxTable site under pkg/dashboard/<area>/... (hand-rolling json.Marshal / fmt.Println for envelope output is forbidden) |
client.go / users.go | Client.DoJSON / DoEmpty / DoRaw, EnsureUser (sync.Once), RequireAdmin, ResolveTargetUser, UserDetail, EnsureSystemStatus, IsOlaresOne, HTTPError, ClassifyTransportErr, IsHTTPError, ErrAlreadyReported | every HTTP call (direct httpClient.Do forbidden); admin gating in pkg/dashboard/overview/user.go |
runner.go | Runner{Iter, RunOnce}, ParseStep | every --watch-aware RunXxx (instantiated and driven from inside the pkg subpackage, never from cmd) |
monitoring.go | FetchClusterMetrics(ctx,c,cf,metrics,window,now,instant), FetchNodeMetrics, FetchUserMetric, MonitoringQuery, MonitoringWindow, DefaultClusterWindow, DefaultDetailWindow | pkg/dashboard/overview/{physical,cpu,memory,pods,user}.go, pkg/dashboard/overview/disk/{main,partitions}.go |
workloads.go | FetchWorkloadsMetrics(ctx,c,cf,req,window,now), MergeWorkloadMetrics, WorkloadAggregate / WorkloadApp / WorkloadRequest, AggregateByDeployment, AggregateByNamespace, PodCountByDeployment, SortWorkloadAggregates | only via BuildRankingEnvelope (do NOT call FetchWorkloadsMetrics directly from <area> subpackages) |
apps.go | FetchAppsList, RawAppListItem (with empty-entrances[] filter mirroring SPA appsWithNamespace) | only via BuildRankingEnvelope |
ranking.go | BuildRankingEnvelope(ctx,c,cf,target,sortBy,sortDir,now) — assembles a workload-grain ranking envelope | consumed by pkg/dashboard/overview/ranking.go and pkg/dashboard/applications/list.go so first-row parity holds (the ONE legitimate cross-area share, hoisted to the pkg root) |
system.go | FetchSystemIFS(ctx,c,testConnectivity), FetchSystemFan, SystemIFSItem, SystemFanData, SystemStatus, EnsureSystemStatus | pkg/dashboard/overview/network.go (capi /system/ifs), pkg/dashboard/overview/fan/live.go (capi mdns/olares-one/cpu-gpu) |
gates.go | GateOlaresOne (hard gate, fan), GPUAdvisory (soft gate, gpu), HasCUDANode, VgpuUnavailableFromError, ResetCUDANodeCache | pkg/dashboard/overview/fan/* (hard gate runs inside Runner.Iter), every pkg/dashboard/overview/gpu/* leaf (advisory note + soft proceed) |
gpu.go | FetchGraphicsList, FetchTaskList, FetchGraphicsDetail, FetchTaskDetail, ExtractHAMIMessage, GraphicsListBody, TaskListBody | every file under pkg/dashboard/overview/gpu/ (list / tasks / get / task / detail / task_detail) |
gpu_format.go | PercentString, PercentDirect, GPUModeLabel, GPUHealthLabel, GPUVRAMHuman, GPUTrendStep, RenderTemperature | every pkg/dashboard/overview/gpu/* table writer; pkg/dashboard/overview/disk/main.go (renderDiskTemperature wraps RenderTemperature) |
gpu_query.go | FetchInstantVector, FetchRangeVector, InstantVectorSample, RangeVectorSeries / Range / Point | pkg/dashboard/overview/gpu/specs.go (gauge/trend runners), pkg/dashboard/overview/gpu/detail.go, pkg/dashboard/overview/gpu/task_detail.go |
lsblk.go | HasPknameLabels, CollectSubtreeByPkname, ResolveParent, BuildLsblkTreePrefix, FlattenLsblkHierarchy, LsblkRow / LsblkFlatRow | pkg/dashboard/overview/disk/main.go, pkg/dashboard/overview/disk/partitions.go |
numbers.go | FormatFloat, SafeRatio, FormatRateAny, ParseRFCTimestamp, SampleFloat, LastSampleFromRow, FirstAnyInArray, ToFloat, RenderTemperature | every display-side rendering site under pkg/dashboard/<area>/... that doesn't go through format directly (typically aliased lowercase in the area's helpers.go) |
fan_curve.go | FanCurveTable (10 rows), FanSpeedMaxCPU, FanSpeedMaxGPU | pkg/dashboard/overview/fan/curve.go (do NOT fetch from BFF — UI constant; SPA also hardcodes) |
schema.go | Kind* constants, AllKinds(), LoadSchemaIndex, //go:embed schemas/*.json | cmd/ctl/dashboard/schema/root.go (loader); every pkg/dashboard/<area>/<verb>.go references the matching Kind* constant when emitting envelopes |
pkg/dashboard/<area>/ subpackages — heavy command-specific logic| Subpackage | Entry points | Main files |
|---|---|---|
pkg/dashboard/applications/ | RunList(ctx,c,cf,sortBy,sortDir) | list.go (+ list_test.go) |
pkg/dashboard/overview/ | RunDefault (sections envelope), RunPhysical, RunUser(target), RunRanking(sortDir), RunCPU, RunMemory(mode), RunPods, RunNetwork(testConn) + the per-node scaffold RunPerNodeMetric (PerNodeDisplayFn) | default.go, physical.go, user.go, ranking.go, cpu.go, memory.go, pods.go, network.go, nodes.go, helpers.go (+ tests) |
pkg/dashboard/overview/disk/ | RunDefault, RunMain, RunPartitions(device) | default.go, main.go, partitions.go, helpers.go (+ tests) |
pkg/dashboard/overview/fan/ | RunDefault, RunLive, RunCurve | default.go, live.go, curve.go, helpers.go (+ tests) |
pkg/dashboard/overview/gpu/ | RunList, RunTasks, RunGet(uuid), RunTask(name,podUID,sharemode), RunDetail(uuid), RunTaskDetail(name,podUID,sharemode) + the specs.go query catalogue (gaugeSpec / trendSpec / runGauge / runTrend / fanoutGaugeAndTrend) | list.go, tasks.go, get.go, task.go, detail.go, task_detail.go, specs.go, helpers.go (+ tests) |
Each subpackage's helpers.go may carry package-private lowercase
aliases (e.g. formatFloat = pkgdashboard.FormatFloat,
gpuVRAMHuman = pkgdashboard.GPUVRAMHuman) for primitives the area
calls many times. These are readability-only — they are NOT new
helpers, do NOT carry behaviour, and are shadowed by helpers_test.go
in the same area for the matching test bodies.
common.go — slim, intentionally near-identicalEvery cmd subpackage carries a small common.go whose contents are
intentionally near-identical between areas. This is expected, not
drift: each area declares its own factory + flag pointer so leaf code
inside that area can read common.X selector-style without importing
sibling areas. The current shape (≤ 70 lines per area, tighter for
sub-areas) is:
// Package <area> hosts the cobra wiring for `olares-cli dashboard <area>`;
// business logic lives in cli/pkg/dashboard/<area>/.
package <area>
import (
"context"
"fmt"
"strings"
"github.com/spf13/cobra"
"github.com/beclab/Olares/cli/pkg/cmdutil"
pkgdashboard "github.com/beclab/Olares/cli/pkg/dashboard"
)
// common is wired by NewXxxCommand; cobra's persistent-flag inheritance
// mutates the pointed-at struct before any leaf RunE fires.
var common *pkgdashboard.CommonFlags
// prepareClient is the area-private *pkgdashboard.Client factory.
func prepareClient(ctx context.Context, f *cmdutil.Factory) (*pkgdashboard.Client, error) { /* ResolveProfile + HTTPClient + NewClient */ }
// unknownSubcommandRunE prints a typed typo hint + returns ErrAlreadyReported.
func unknownSubcommandRunE(c *cobra.Command, args []string) error { /* SuggestionsFor + ErrAlreadyReported */ }
That's the WHOLE file. No trampoline aliases, no pkgdashboard.X
re-exports, no envelope helpers. If a leaf needs a pkg helper, it
imports pkgdashboard (or the area's pkg subpackage) and calls it
directly. The earlier ~400-line common.go shape — full of lowercase
re-exports of every pkg name — has been retired; reintroducing it is a
forbidden regression (see Iteration red-lines).
Do NOT "consolidate" common.go files by importing one area's
from another — that violates the no-horizontal-import red-line. If two
areas need the same heavyweight helper, hoist it to
cli/pkg/dashboard/ (as we did with BuildRankingEnvelope).
<verb>.go per leaf); pkg subpackages mirror the same tree (one <verb>.go per leaf, exposing one RunXxx entry); pkg root is by domain (one file per fetcher / aggregator / infrastructure family). Extract a NEW pkg-root domain file ONLY when its function set is independently testable and isn't a shim over an existing domain. Cmd subpackages NEVER import each other horizontally; pkg subpackages NEVER import each other horizontally and the pkg root NEVER imports a subpackage.root.go (cobra assembly + AddCommand for children, RunE = pkg<area>.RunDefault or RunList per the parent-default rule) + common.go (≤ 70 lines: var common, area-local prepareClient, unknownSubcommandRunE — and nothing else). Each leaf is a single file named after the verb (e.g. cpu.go, disk/main.go, gpu/list.go); RunE = validate flags → prepareClient → pkg<area>.RunXxx(...). Stub-style trampoline aliases are forbidden (see Iteration red-lines).monitoring.go, workloads.go, gpu.go, gpu_format.go, gpu_query.go, lsblk.go, gates.go, fan_curve.go, numbers.go, …). Pkg subpackages (<area>/) carry <verb>.go files that own the RunXxx + BuildEnvelope + WriteTable triplet for each leaf, plus an optional helpers.go (package-private lowercase aliases of pkg-root primitives, readability only).NewXxxCommand(f *cmdutil.Factory, cf *pkgdashboard.CommonFlags) *cobra.Command. The factory + flags pointer MUST be passed in explicitly; the cmd area subpackage stores cf in var common *pkgdashboard.CommonFlags for leaf RunE bodies; no package-level / global factory variable.Use / Short / (where helpful) Example filled. SilenceUsage = true, SilenceErrors = true. Stub-style commands (none today) print an envelope with meta.error="not implemented" AND a clear stderr line. The contract is pinned by TestAllLeafCommandsSilenced in cmd-root.*pkgdashboard.HTTPError; auth errors → reformatted at the pkg layer. Never wrap typed *credential.ErrTokenInvalidated / *credential.ErrNotLoggedIn — surface them directly so the standard "run profile login" CTA fires. Soft-printed errors (already shown to the user) return pkgdashboard.ErrAlreadyReported so the cmd-root wrapper does not double-print.pkgdashboard.FetchClusterMetrics; new per-node metric → check FetchNodeMetrics; new user-grain metric → check FetchUserMetric; new workload-grain view → check BuildRankingEnvelope; new per-node leaf → reuse pkg/dashboard/overview/nodes.RunPerNodeMetric with a PerNodeDisplayFn; new watchable command → use pkgdashboard.Runner from inside the pkg RunXxx body.format 100%, runner 100%, new pkg-root fetcher fields 100% (httptest wire-shape pin), per-leaf RunXxx happy-path + non-trivial branches (3-state empty, gating, soft-failure) 100%. The ratchet only goes up.--watch (--since, --watch-interval, --watch-iterations, --watch-timeout), say so verbatim in the flag help to keep olares-cli ... --help self-documenting.Allowed (incremental work):
cli/pkg/dashboard/<area>/<verb>.go exposing RunXxx(ctx, c, cf, ...) plus a BuildXxxEnvelope builder and (if it has a non-trivial table) a WriteXxxTable. Add a <verb>_test.go next to it (one happy-path test minimum, plus error-class coverage matching nearby leaves; reuse the area's helpers_test.go httptest fixtures).cli/cmd/ctl/dashboard/<area>/<verb>.go — a single cobra command literal whose RunE is pkg<area>.RunXxx(...) after common.Validate() + prepareClient(...). No envelope / table code in cmd.root.go via AddCommand.KindXxx in pkg/dashboard/schema.go, append to AllKinds(), drop the JSON file under pkg/dashboard/schemas/, register in LoadSchemaIndex().--watch loop, drive pkgdashboard.Runner from inside RunXxx (do NOT spawn it from cmd).cli/cmd/ctl/dashboard/<area>/{root.go, common.go} — var common *pkgdashboard.CommonFlags, area-local prepareClient, unknownSubcommandRunE. common.go stays ≤ 70 lines; do NOT add trampoline aliases.cli/pkg/dashboard/<area>/{helpers.go, helpers_test.go, default.go} (and per-leaf <verb>.go files as you go). helpers.go may carry package-private lowercase aliases for frequently called pkg-root primitives (readability only, no new symbols).default.go exposes RunDefault — either the parent's sections envelope (Shape B) or a single-leaf default (Shape A, as overview/gpu does).omitempty field on Meta (forward-compatible).CommonFlags flag (in pkg/dashboard/flags.go; bind in cmd-root options.go; Validate must keep all current rules).overview memory --mode, overview network --test-connectivity, overview gpu task --sharemode, overview ranking --sort, applications --sort).pkg/dashboard/dashboard_test.go; per-leaf envelope → pkg/dashboard/<area>/<verb>_test.go; cobra wiring → cmd/ctl/dashboard/dashboard_test.go).display rendering as long as golden.json still passes.Forbidden (regression / scope creep):
cli/cmd/ctl/dashboard/ — every leaf belongs in its area subpackage. The cmd-root only holds root.go (assembler), options.go (cobra↔CommonFlags binding), and dashboard_test.go (root-tier tests).pkg<area>.RunXxx). Any function that builds an Envelope, writes a table, runs Runner.Run, or fans out concurrent HTTP belongs in cli/pkg/dashboard/<area>/....applications/ may NOT import overview/; overview/disk/ may NOT import overview/fan/. Parent-area imports (overview/root.go importing overview/disk etc. for AddCommand wiring) are fine.cli/pkg/dashboard/ — the pkg root MUST NOT import any subpackage; sibling subpackages (overview/disk, overview/fan, overview/gpu) MUST NOT import each other. If two area subpackages need the same heavy helper, hoist it to cli/pkg/dashboard/ (the BuildRankingEnvelope precedent — consumed by both pkg/dashboard/overview/ranking.go and pkg/dashboard/applications/list.go).common.go — earlier revisions had ~400-line common.go files re-exporting every pkg name. The slim shape is the contract: common.go carries var common, prepareClient, unknownSubcommandRunE, and nothing else.Kind* constant.omitempty is fine; structural change is not).overview, overview disk, overview fan default to Shape B; overview gpu defaults to gpu list (Shape A); overview cpu / overview memory / overview pods etc. are leaves and keep their Shape A defaults. New parent commands MUST pick one of these two shapes and pin it in the Use description (and in the area's pkg default.go).pkg/dashboard/format/ (including ad-hoc fmt.Sprintf("%.2f", ...) for user-visible values).http.Client, manual X-Authorization).--watch exit semantics (3-fail cap, ErrTokenInvalidated short-circuit, SIGINT exit-0, NDJSON-per-iteration). Adding new exit conditions requires a separate plan.--since ↔ --start/--end mutual-exclusion check in pkg/dashboard/flags.go::CommonFlags.Validate.meta.empty_reason values or replacing them with prose.EnsureUser cache lifetime / RequireAdmin guard placement / ResolveTargetUser admin check.Overview2 or Applications2.FetchSystemIFS / FetchSystemFan to monitoring endpoints — these are deliberate capi paths and the SPA uses the same.FanCurveTable (Go const in pkg/dashboard/fan_curve.go) and SPA Fan/config.ts.tableData. If you have to change one, change BOTH and update the test pinning the row count.BuildRankingEnvelope (or its underlying FetchWorkloadsMetrics) with bespoke per-namespace fetches; first-row parity between overview ranking and applications MUST hold.If a task seems to require any forbidden item, STOP, surface this skill to the user, and ask for a documented exception. Do not silently refactor.
Tests sit at four tiers, mirroring the two-layer split. The rule is:
a test belongs next to the code that owns its non-trivial
behaviour. Almost every dashboard test is now alongside the
Run* it exercises, in the matching pkg/dashboard/<area>/
subpackage; the pkg-root dashboard_test.go has narrowed to
infrastructure only.
| Tier | Where | What lives there | How to run |
|---|---|---|---|
| Pkg-root infrastructure tests | cli/pkg/dashboard/dashboard_test.go + helpers_test.go (in-package trampolines) | INFRA-only: CommonFlags.Validate / ResolveWindow, Client.EnsureUser / EnsureSystemStatus / RequireAdmin / IsOlaresOne, Runner (one-shot + watch + 3-fail + watch-required-flag), Fetch* wire-shape pins (httptest), Merge* / aggregator math, lsblk tree algorithms, gates (HasCUDANode, GateOlaresOne, GPUAdvisory, VgpuUnavailableFromError, ExtractHAMIMessage), GPU formatters & body builders (Graphics/TaskListBody), Fetch{Instant,Range}Vector, GPUTrendStep, WriteJSON / HeadItems / ClassifyTransportErr. No envelope-shape or table-render tests for individual leaves — those live in the area subpackages. | go test ./cli/pkg/dashboard/ |
| Pkg-area envelope / RunXxx tests | cli/pkg/dashboard/<area>/<verb>_test.go + per-area helpers_test.go (shared httptest fixtures) | Per-leaf coverage of the RunXxx / BuildXxxEnvelope / WriteXxxTable triplet, including 3-state empty handling, capability gating, soft-failure semantics, and partial-failure fan-out (e.g. TestBuildGPUDetailFullEnvelope_PartialFailure, TestBuildGPUTaskDetailFullEnvelope_TimeSlicingSkipsAllocation, TestBuildLiveEnvelope_NoGraphicsIntegrationStillSucceeds, TestBuildSectionsEnvelope_ErrorPropagation). Lark-aligned density: at least one happy-path + the bug classes the file actually has. | go test ./cli/pkg/dashboard/<area>/... |
| Cmd-root behavioural tests | cli/cmd/ctl/dashboard/dashboard_test.go | Only what's tied to NewDashboardCommand's cobra binding — TestUnknownSubcommandRunE_PrintsSuggestionAndFailsOnTypo, TestUnknownSubcommandRunE_NoArgsPrintsHelp, TestLeafErrorsAreReported, TestLeafErrorsSentinelNotDoublePrinted, TestAllLeafCommandsSilenced (regression net for "Cobra printed usage when HAMI returned 5xx"). The bar for this tier is high — keep the file short. | go test ./cli/cmd/ctl/dashboard/ |
| Format JS oracle | pkg/dashboard/format/format_test.go + testdata/golden-gen.js | TestFormat_GoldenOracle against golden.json (skips if absent) | cd cli/pkg/dashboard/format/testdata && node golden-gen.js; go test ./cli/pkg/dashboard/format/... |
| Coverage | go test -coverprofile | coverage-dashboard.html | go test -coverprofile=coverage-dashboard.out -covermode=atomic ./cli/cmd/ctl/dashboard/... ./cli/pkg/dashboard/... && go tool cover -html=coverage-dashboard.out -o coverage-dashboard.html |
Test layout rules:
cli/pkg/dashboard/dashboard_test.go (or, if the file gets too long, a new <domain>_test.go next to it). Leverage helpers_test.go's var common *CommonFlags and lowercase trampolines (fetchClusterMetrics, gateOlaresOne, newTestClient, …) so the body reads the same way the SPA-mirroring helpers do.RunXxx / BuildXxxEnvelope / WriteXxxTable for one cmd verb) → cli/pkg/dashboard/<area>/<verb>_test.go, sharing the area's helpers_test.go httptest fixtures (newTestClient, area-specific *StubMux, etc.). Do NOT put it on the cmd side.dashboard_test.go. Keep the file short; reject any test that could equally live in pkg.*_test.go files — every leaf's behaviour is covered through Run* in pkg. Resist the urge to add cobra-bound tests on the cmd side; if you need them, justify it with a regression that genuinely depends on cobra binding (the existing pkg-area files already cover everything that Run* can be exercised with).Real-machine end-to-end smoke / regression scripts live outside the
repo — agents should drive olares-cli dashboard <command> -o json
directly against a live profile and pipe through jq -e for assertion.
There is no committed bash harness; prefer ad-hoc jq queries against
the documented Kind enum + envelope shape.
| Error message | Cause | Fix |
|---|---|---|
server rejected the access token (HTTP 401/403) | Token expired / wrong / missing | Defer to ../olares-shared/SKILL.md. Do not catch + retry inside dashboard code. |
--since cannot be combined with --start/--end (mutually exclusive) | User passed both | Drop one; sliding window vs fixed window are different intents. |
--watch-iterations requires --watch (similarly --watch-timeout, --watch-interval) | Watch knob without --watch | Add --watch, or drop the knob. |
--user "<id>" requires platform-admin; <self> does not have that role | Non-admin tried to query another user | Log in as admin, or drop --user. |
meta.empty_reason = no_vgpu_integration (200 envelope, items:[]) | HAMI vGPU absent (HTTP 404) | Not an error — install HAMI or skip the GPU view. |
meta.empty_reason = vgpu_unavailable (200 envelope, items:[]) + meta.http_status=5xx + meta.error="..." + stderr gpu data temporarily unavailable: HAMI returned HTTP 5xx ... | HAMI installed but unhealthy (HTTP 5xx) | Transient — retry; investigate the HAMI controller. CLI exit code stays 0 so --watch keeps streaming. |
meta.empty_reason = no_gpu_detected (200 envelope, items:[]) | HAMI healthy but the device list / detail is empty | Not an error. |
meta.empty_reason = no_fan_integration (200 envelope, items:[]) | /user-service/api/mdns/olares-one/cpu-gpu returned 404 (Olares One only) | Not an error on non-Olares-One deployments. |
meta.empty_reason = not_olares_one (200 envelope, items:[]) + stderr fan is only available on Olares One devices ... | Active device's device_name is not Olares One — fan subtree is hard-gated off | Not an error. Skip fan on this hardware; verify with curl /user-service/api/system/status. |
meta.note="GPU sidebar entry is hidden for non-admin profiles ..." + stderr (advisory) GPU sidebar entry ... | Non-admin profile invoked a gpu verb. Soft advisory only — data still fetched. | Not an error. If HAMI still returned data, surface the note alongside the items. Switch to a platform-admin profile if the agent needs the SPA-equivalent UX. |
meta.note="no node carries gpu.bytetrade.io/cuda-supported=true ..." + stderr (advisory) no node carries ... | Cluster has no node labelled gpu.bytetrade.io/cuda-supported=true. Soft advisory only. | Not an error. Surface the note; HAMI may still report devices. |
kind=dashboard.<x>, meta.error="..." inside an NDJSON line | Single failed --watch iteration | Fine — watch keeps running. Three in a row exits non-zero. |
unknown subcommand "application" (typo) | User typed application instead of applications | Use the suggestion printed; shell completion also helps. |
One-shot snapshot for an agent:
olares-cli dashboard overview -o json | jq '.sections.user.items[0].raw.utilisation'
Stream CPU at the SPA's cadence, two iterations:
olares-cli dashboard overview cpu --watch --watch-iterations 2 -o json
Sliding 5-minute window, polled every 10s, until SIGINT:
olares-cli dashboard overview cpu --watch --since 5m --watch-interval 10s -o json
Fixed window (no sliding even if --watch set):
olares-cli dashboard overview cpu --start 2026-04-28T08:00:00Z --end 2026-04-28T08:30:00Z -o json
Disk sections (main table + per-disk partitions):
olares-cli dashboard overview disk -o json | jq '.sections.partitions.sections | keys'
olares-cli dashboard overview disk partitions sda -o json
Fan sections (live + curve):
olares-cli dashboard overview fan -o json | jq '.sections.curve.items | length' # → 10
olares-cli dashboard overview fan live --watch --watch-iterations 5 -o json
GPU + tasks (3-state):
olares-cli dashboard overview gpu list -o json
olares-cli dashboard overview gpu tasks -o json
olares-cli dashboard overview gpu get GPU-0123abcd -o json
olares-cli dashboard overview gpu task my-task abcdef-0123-...
Cross-user query as admin:
olares-cli dashboard overview user bob -o json
Real-machine smoke (drive -o json and jq -e directly):
olares-cli profile use olares-id
olares-cli dashboard schema -o json | jq -e '.kind == "dashboard.schema.index"'
olares-cli dashboard overview -o json | jq -e '.sections.physical.kind == "dashboard.overview.physical"'
olares-cli dashboard overview gpu list -o json | jq -e '.kind == "dashboard.overview.gpu.list"'