Alibabacloud Emas Apm Query
v0.0.1Alibaba Cloud EMAS APM (mobile Application Performance Monitoring) issue troubleshooting skill. Covers the 4 read-only OpenAPIs exposed by the `aliyun emas-a...
Like a lobster shell, security has layers — review code before you run it.
alibabacloud-emas-apm-query
1. Scenario Description & Architecture
After a mobile app integrates Alibaba Cloud EMAS APM, the crash / anr / lag / custom / memory_leak / memory_alloc events it produces every day are aggregated and reported by the SDK to the backend. A typical troubleshooting workflow is:
- Figure out which Issues are most worth fixing: sort by error rate / error count → pick Top 3~5
- Inspect what a specific Issue looks like: fetch its aggregated metrics and affected versions
- Find several representative samples: across different devices / versions / networks
- Read the stack + business log in a sample: find actionable clues
- Compare against the app source code and propose a fix
This skill stitches the 5 steps above into a single CLI pipeline. The entire process only calls the 4 read-only APIs of aliyun emas-appmonitor, and depends on no database / log service:
GetIssues → GetIssue → GetErrors → GetError
↓
(optional) stack ↔ user APP source → precise file:line + fix diff
Supported BizModules: crash / anr / lag / custom / memory_leak / memory_alloc
Supported OS: android / iphoneos / harmony (harmony does not have anr / memory_*)
2. Prerequisites
| Item | Requirement | Self-check command |
|---|---|---|
| Aliyun CLI version | >= 3.3.3 | aliyun version |
| Plugin | aliyun-cli-emas-appmonitor | aliyun emas-appmonitor --help |
| jq | any version (required by scripts) | jq --version |
Full installation steps: references/cli-installation-guide.md. Recommended: enable auto plugin installation once:
aliyun configure set --auto-plugin-install true
aliyun plugin update
3. Credential Pre-check
Do NOT print AK/SK values; just verify that an available profile exists:
aliyun configure list
The expected output contains a current profile whose Mode / RegionId are non-empty. If not, configure one of AK / OAuth / StsToken / RamRoleArn per references/cli-installation-guide.md#Configuration.
This skill never reads or forwards AK / SK field values themselves during its whole lifecycle.
4. AI-mode Lifecycle
Start (before the skill runs):
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-emas-apm-query"
End (after troubleshooting completes):
aliyun configure ai-mode disable
5. RAM Permissions
This skill only needs the 4 read-only emasha:View* actions (ViewIssues / ViewIssue / ViewErrors / ViewError). EMAS AppMonitor's RamCode is emasha and does not support resource-level authorization, so Resource is fixed to "*".
For the full least-privilege JSON policy, the equivalent system policies (AliyunEMASAppMonitorReadOnlyAccess / AliyunEMASAppMonitorFullAccess), and common permission-error troubleshooting, see references/ram-policies.md.
[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:
- Read
references/ram-policies.mdto get the full list of permissions required by this SKILL- Use
ram-permission-diagnoseskill to guide the user through requesting the necessary permissions- Pause and wait until the user confirms that the required permissions have been granted
6. Parameter Confirmation
IMPORTANT: Parameter Confirmation — Before executing any command or API call, ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks, passwords, domain names, resource specifications, etc.) MUST be confirmed with the user. Do NOT assume or use default values without explicit user approval.
| Parameter | Required | Type | Description | Default |
|---|---|---|---|---|
app-key | Yes | int64 | EMAS APP Key (typically 9+ digits). Prefer to infer from SDK initialization code in the current workspace (see the 6 rule families in references/appkey-detection.md); if exactly one match is found, echo it and wait for user confirmation; if multiple, list candidates and let the user pick; on miss, ask the user to provide it manually. | None (default: probed from workspace) |
os | Yes | enum | android / iphoneos / harmony (H5 goes to h5, which is out of scope). Inferred together with app-key from the project type: build.gradle / AndroidManifest.xml → android; *.xcodeproj / Podfile → iphoneos; module.json5 + ets/ → harmony. For cross-platform Flutter / Unity projects, the user MUST pick one (android / iphoneos). | None (default: probed from project type) |
time-range | Yes | object | StartTime=<ms> EndTime=<ms> Granularity=1 GranularityUnit=<HOUR|DAY> | Last 24 hours (user-overridable) |
biz-module | No | list | If omitted, all 6 modules are scanned; if specified, only that module is analyzed | All 6 |
digest-hash | No | string | If the user already knows a specific Issue, skip the Top-N stage and drill down directly | None |
top-n | No | int | Number of Top issues | 5 |
filter-json | No | string | Further narrow down (specific version / device model / region ...), a JSON string | Not applied |
Timestamp unit: every API uses Unix milliseconds. If the user passes a value in seconds (< 1e12), the scripts will automatically multiply by 1000.
biz-module pitfall: the CLI --help lists the legacy enum (exception / crash / lag / custom / h5JsError / h5WhiteScreen); however, anr / memory_leak / memory_alloc are actually forwarded to the backend and work. This skill scans all 6 modules requested by the user by default; see references/biz-module-reference.md.
time-range pitfall: in some environments Granularity=60 GranularityUnit=MINUTE is rejected by the backend (returns Code: 200, Message: "unknown error"). Always prefer Granularity=1 GranularityUnit=DAY or GranularityUnit=HOUR.
--os pitfall: the CLI --help marks --os as optional, but in practice omitting it returns an empty list without error (Model.Items=[], Total=0). All 4 APIs must pass --os explicitly.
--did pitfall: get-error's --did is also marked optional in --help, but is implicitly required by the backend. Omitting it returns Code: 100011 Parameter Not Enough. Take it from get-errors' Items[*].Did (already handled by dig_issue.sh; when calling aliyun emas-appmonitor get-error manually, pass it explicitly).
Dual semantics of DigestHash: get-errors returns Items[*].DigestHash, which is the hash of a single event, different from the aggregated --digest-hash you passed in. When calling get-error next, still use the aggregated hash (the one you used in get-issues / get-issue); do not switch to the single-event hash.
Reuse biz-module: whichever bizModule was used to obtain a Top Issue from get-issues must be reused for the next three steps (get-issue / get-errors / get-error); otherwise the response will be empty (the same DigestHash exists under only one bizModule). list_top_issues.sh already attaches a bm field to each row so it can be reused.
7. Core Workflow
flowchart TD
Start[User request] --> DetectCtx{Workspace can infer AppKey+OS?}
DetectCtx -- single match --> ConfirmCtx[Echo for user confirmation]
DetectCtx -- multiple matches --> PickCtx[List candidates, let user pick]
DetectCtx -- miss --> AskCtx[Ask user for AppKey + OS]
ConfirmCtx --> HasHash
PickCtx --> HasHash
AskCtx --> HasHash
HasHash{digestHash provided?}
HasHash -- yes --> SingleIssue[get-issue: fetch single Issue metadata]
HasHash -- no --> ParallelIssues[Parallel get-issues over 6 bizModules]
ParallelIssues --> TopN[Sort by errorRate, take Top 3-5]
TopN --> IterateTop[Iterate Top Issues]
IterateTop --> SingleIssue
SingleIssue --> GetErrors[get-errors: latest 3-5 samples]
GetErrors --> PickSample["Sample policy: latest / hot device / latest affected version"]
PickSample --> GetError[get-error: stack/threads/logs/dimensions]
GetError --> HasCode{CWD has APP source?}
HasCode -- yes --> CodeMatch[stack -> file + line -> diff]
HasCode -- no --> CliReport[CLI-only diagnostic report]
CodeMatch --> Report[Final report: issue list + root cause + fix]
CliReport --> Report
Report --> CallFailed{Any CLI call failed?}
CallFailed -- yes --> CliSelfDiag["CLI self-diagnose: --log-level debug / --cli-dry-run / configure list / plugin update"]
CallFailed -- no --> endNode[Done]
CliSelfDiag --> endNode
7.0 Runtime locate the Skill directory ($SKILL_DIR)
The Skill's own path is known to the Agent at the time SKILL.md is loaded (see the fullPath / path field under <available_skills> in the context). Before running any bash command that needs to read bundled resources from this Skill (scripts/ / assets/ / references/), the Agent MUST first export the directory of SKILL.md to SKILL_DIR exactly once:
# The Agent fills in the absolute path of SKILL.md into the placeholder, then exports once
export SKILL_DIR="$(cd "$(dirname "<ABSOLUTE_PATH_OF_SKILL.md>")" && pwd)"
# Self-check: all three directories must exist
[[ -d "$SKILL_DIR/scripts" && -d "$SKILL_DIR/assets" && -d "$SKILL_DIR/references" ]] \
|| { echo "[ERROR] SKILL_DIR does not point to the root of this Skill: $SKILL_DIR" >&2; exit 1; }
Rules:
- Do not hardcode
~/.cursor/skills-cursor/...or~/.claude/skills/...: this Skill can be distributed in the repository (.agent/skills/alibabacloud-emas-apm-query/) or at the user level, and the absolute path varies with the host. - Do not rely on
cdinto the Skill directory to use relative paths: the scripts drop artifacts into the current working directory (the user's APP source root);cdwould break this semantic. - The bash scripts have a fallback:
scripts/list_top_issues.shandscripts/dig_issue.shauto-detect their own location viaBASH_SOURCEat the top, so they can locate$SKILL_DIReven if it was not exported. Other inlinejq/rgcommands inside SKILL.md still require the Agent to export$SKILL_DIRfirst.
7.1 Stage A: Top N (when digest-hash is not provided)
Use scripts/list_top_issues.sh to scan the 6 biz_modules in parallel:
bash "$SKILL_DIR/scripts/list_top_issues.sh" \
--app-key <AppKey> \
--os <iphoneos|android|harmony> \
--start-time <startMs> \
--end-time <endMs> \
--top-n 5 \
--order-by ErrorRate
The output is a merged Top-N table, each row containing {bm, digestHash, ec, er, edc, edr, name, type, reason}.
To add a Filter (e.g. "only version 3.5.x"), append --filter-json '{"Key":"appVersion","Operator":"in","Values":["3.5.0","3.5.1"]}'; see references/filter-reference.md.
7.2 Stage B: Drill into a single Issue
Use scripts/dig_issue.sh:
bash "$SKILL_DIR/scripts/dig_issue.sh" \
--app-key <AppKey> \
--os <iphoneos|android|harmony> \
--biz-module <crash|anr|lag|custom|memory_leak|memory_alloc> \
--digest-hash <13-char Base36> \
--start-time <startMs> --end-time <endMs> \
--sample-size 3
Output directory:
emas-apm-dig-<AppKey>-<DigestHash>-<epoch>/
01-get-issue.json
02-get-errors.json (contains the ClientTime/Uuid/Did triples)
samples/<Uuid>.json (complete JSON per sample, includes Backtrace/EventLog etc.)
report.md (structured markdown report)
7.3 Stage C: Code mapping + diff (if the CWD contains APP source)
Follow references/troubleshoot-workflow.md:
- Determine the platform (Android / iOS / Harmony / RN / Flutter / Web)
Model.Backtrace→ keep APP user frames → grep the source → locate file:line- Enrich the timeline using
EventLog+Controllers+Threads+CustomInfo - Emit the smallest diff (≤ 20 lines + one sentence of "why")
If the CWD does not contain the source: emit only a CLI diagnostic report (Issue overview, sample dimension comparison, representative stack), and append a hint that "switching to the APP source directory enables code-level localization".
7.4 Failure handling (CLI only)
When any aliyun emas-appmonitor call fails, run the following self-checks in order:
aliyun configure list # 1. current profile / mode / region
aliyun plugin update # 2. latest plugin
aliyun emas-appmonitor <cmd> ... --cli-dry-run # 3. parameter serialization check
aliyun emas-appmonitor <cmd> ... --log-level debug # 4. HTTP body + RequestId
Do not guide the user to query any server-side data source.
8. Success Verification
The full 6-step CLI self-verification (with runnable commands and pass/fail criteria for each step) is in references/verification-method.md. The correct-vs-incorrect CLI pattern matrix is in references/acceptance-criteria.md. Core criteria:
- Reachable:
get-issuesdry-run prints the HTTP body successfully - Non-empty: some biz_module has
Model.Total >= 1 - Stable: two calls with identical parameters return the same Top 5
DigestHash - Filter works: after adding a filter,
Totalis strictly <= the full count - Three-level chain:
issues → issue → errors → errorcan pull a Stack end to end - Diagnosable: on induced errors, the output includes
RequestIdandErrorCode
9. Cleanup
This skill is read-only; it does not create any cloud resources that need cleanup.
Tear-down is only two things:
aliyun configure ai-mode disable
# (optional) delete the local JSON directories produced by dig_issue.sh
rm -rf ./emas-apm-dig-*
10. Best Practices
- Probe first, ask later: before entering the main flow, grep SDK initialization code from the user's workspace per
references/appkey-detection.mdto inferapp-key/os; confirm with the user only after a hit, rather than asking upfront. - Top first, then drill: do not run
dig_issue.shagainst every Issue from the start — first uselist_top_issues.shto aggregate the Top N, then drill into each of them. The total number of CLI calls isO(N)rather thanO(all). - Always pass
--os:--oson all 4 APIs is marked optional in--help, but omitting it returns empty results silently. Always specifyandroid / iphoneos / harmonyexplicitly. get-errorMUST carry--did: marked optional in--helpbut implicitly required by the backend; take it fromItems[*].Didin theget-errorsresponse.- Reuse
biz-module: the nextget-issue/get-errors/get-errorcalls must use the same bizModule that produced the Issue inget-issues; switching will return empty. - Shrink the time window from "wide" to "narrow": start diagnosis with 24h /
Granularity=1 GranularityUnit=DAY; once a specific version / device is located, shrink to 1~4 hours withGranularityUnit=HOUR. - Filters are JSON strings: the entire
--filtervalue must be a single JSON string; build nestedSubFilterswithjq -cnto avoid manual escape errors (seereferences/filter-reference.md). - Multi-account scenarios: confirm the profile via
aliyun configure listand pass--profile <name>explicitly rather than relying on implicit env-var switching. - Persist
get-error: this API response can be from hundreds of KB to several MB; do not truncate JSON withhead/tail. Write to> /tmp/emas-error-XXX.jsonfirst and then process withjq. - Android obfuscation: when you see class names like
a.a.a.b.c, ask the user formapping.txtbefore attempting code mapping rather than guessing. - iOS not symbolicated: when
Model.SymbolicStatus=false, theStackcontains many raw addresses; only emit conclusions at device / version dimensions, and re-analyze after dSYM is uploaded. - Parallel QPS control:
list_top_issues.shhas a built-insleep 0.3sto avoid throttling; scanning 6 biz_modules takes 2~3 seconds in total and does not need extra concurrency. - Empty
biz-moduleresults are not errors:anr / memory_*underharmonyor very-low-traffic AppKeys returningTotal=0is normal and should not be retried. - Do not reverse-use this skill to write data: all 4 APIs are
Get*/View*. If the user wants to "update Issue status" or "mark as fixed", that falls under write APIs likeUpdateIssueStatusand is out of scope.
11. Reference Links
| Document | Purpose |
|---|---|
references/cli-installation-guide.md | Aliyun CLI installation / configuration / plugins / credentials |
references/appkey-detection.md | Identify AppKey and OS from the user's workspace across Android / iOS / Harmony / Flutter / Unity / H5 |
references/ram-policies.md | Least-privilege JSON + Permission Failure Handling |
references/get-issues.md | GetIssues parameters / response / ordering |
references/get-issue.md | GetIssue parameters / response |
references/get-errors.md | GetErrors parameters / response |
references/get-error.md | GetError parameters / response |
references/filter-reference.md | --filter structure / operators / SubFilters / dry-run validation |
references/biz-module-reference.md | 6 biz_modules x platforms x available filterCode list |
references/troubleshoot-workflow.md | Full flow for stack -> source -> diff |
references/related-commands.md | Cheat sheet for all aliyun emas-appmonitor commands + skill boundary |
references/verification-method.md | 6-step runnable CLI verification with pass/fail criteria |
references/acceptance-criteria.md | Correct vs incorrect CLI pattern matrix (for review / self-check) |
assets/system-filters/index.json | Index of 14 static filter snapshots (biz_module x platform) |
Comments
Loading comments...
