Skylv Agent Quality Tester

AdvisoryAudited by Static analysis on May 1, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If you point it at sensitive logs or conversation exports, those contents are processed by the tool or agent context.

Why it was flagged

The helper reads a local file supplied by the user; for this skill that file is likely an agent log or conversation, which may contain private data. The artifacts show local scoring only and no network or persistence.

Skill content
const content = fs.readFileSync(file, 'utf8');
Recommendation

Use only intended log files, redact secrets, and avoid broad/private files unless you are comfortable having them evaluated.

NoteMedium Confidence
ASI01: Agent Goal Hijack
What this means

A malicious or prompt-injected sample could bias the evaluation report if the judging agent follows instructions inside the sample.

Why it was flagged

The skill may place untrusted agent outputs or conversations into an LLM judging context. If those samples contain prompt-injection text, the evaluator should treat it as data rather than authoritative instructions.

Skill content
Input: Agent's recent conversation or output samples ... Analysis: Score each dimension using LLM-as-judge
Recommendation

When using the skill, clearly quote or delimit samples and instruct the agent to ignore any instructions contained within the material being evaluated.

What this means

Users could place too much confidence in a simple quality score, especially for safety or release-readiness decisions.

Why it was flagged

The helper implements simple heuristic criteria that differ from the SKILL.md claim of LLM-as-judge scoring across coherence and adaptability. This is not malicious, but users should not over-trust the resulting score as an objective safety assessment.

Skill content
clarity: { weight: 0.15, ... }, ... helpfulness: { weight: 0.20, ... }
Recommendation

Treat the output as a lightweight review aid and pair it with human review or a more formal evaluation process for high-stakes agents.