Testing Skill for Agency Agents

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This is mostly a testing-oriented prompt skill, but it pushes agents to run local scripts and broad load/API tests without clear scoping, and stores QA evidence in a potentially public path.

Install only if you are comfortable with a testing agent that may run shell-based QA commands. Before use, review any local qa-playwright-capture.sh script, pin or approve npx tools, restrict API and load tests to authorized staging targets, and change the screenshot output path away from public/ unless you are sure it cannot be served or deployed.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If a project contains a malicious or unsafe qa-playwright-capture.sh, the agent could execute it while performing QA.

Why it was flagged

The prompt tells the agent to run a local shell script as a mandatory first step, but that script is not included in the reviewed skill package or declared as an install requirement.

Skill content
STEP 1: Reality Check Commands (ALWAYS RUN FIRST) ... ./qa-playwright-capture.sh http://localhost:8000 public/qa-screenshots
Recommendation

Require explicit user approval before running local scripts, include or document the helper’s trusted source, and ask users to review the script before execution.

What this means

An agent following these instructions could overload a development, production, or third-party service if the user has not clearly approved the target and load profile.

Why it was flagged

The performance-testing instructions are broad and do not require explicit target selection, authorization, rate limits, or staging-only execution.

Skill content
Execute load testing, stress testing, endurance testing, and scalability assessment across all systems
Recommendation

Run load and stress tests only on explicitly approved test/staging targets, with defined rate limits, windows, credentials, and rollback/stop conditions.

What this means

Screenshots and test-result files may contain sensitive application data and could be accidentally exposed if public/ is served or deployed.

Why it was flagged

The workflow stores screenshots and test results under a public directory, creating persistent QA context that may be web-served, deployed, or reused without cleanup guidance.

Skill content
./qa-playwright-capture.sh http://localhost:8000 public/qa-screenshots ... Evidence Location: public/qa-screenshots/
Recommendation

Store QA evidence in a private temporary directory by default, redact sensitive data, avoid committing or deploying artifacts, and define cleanup/retention rules.

What this means

The user’s environment may download or run tool versions that differ from what the skill author expected.

Why it was flagged

The skill documents unpinned npx tool execution, which is expected for accessibility testing but still relies on runtime package resolution outside the reviewed artifact set.

Skill content
npx @axe-core/cli http://localhost:8000 --tags wcag2a,wcag2aa,wcag22aa ... npx lighthouse http://localhost:8000
Recommendation

Prefer pinned tool versions or a reviewed project dependency setup, especially in CI or shared environments.