Medical Scribe (Dictation)

Security checks across malware telemetry and agentic risk

Overview

The skill is purpose-aligned overall, but it needs Review because patient dictation can be processed through external LLM APIs while the documentation understates network and clinical-safety boundaries.

Install only in an isolated environment, pin and review dependencies, and do not use the OpenAI or Anthropic option with patient data unless your organization has approved the provider, privacy terms, and PHI handling. Treat generated diagnoses, differentials, and reasoning as draft documentation requiring clinician review.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (17)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation indicates file read/write capabilities and a packaged executable path, but it does not declare permissions accordingly. This creates a governance gap where reviewers or runtime controls may underestimate what the skill can access, increasing the chance of unintended file access or unsafe deployment assumptions.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The declared description frames the skill as simple dictation-to-SOAP conversion, but the documented behavior includes audio transcription, local file I/O, and possible external API use with OpenAI or Anthropic. This mismatch is dangerous because operators may approve or run the skill without understanding that sensitive medical data could be transmitted externally or written to disk.

Intent-Code Divergence

Medium

Confidence: 83% confidence
Finding: The safety notes claim the tool does not provide diagnostic suggestions, yet the documented SOAP output includes primary diagnosis and differential diagnoses. In a medical context, this contradiction can cause unsafe reliance on generated clinical assessments and weakens user understanding of the tool's actual clinical influence.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The audit metadata classifies a medical SOAP-note skill as 'Academic Writing,' creating a material scope mismatch in the control plane that governs how the skill is reviewed and routed. In a medical context, misclassification can weaken domain-specific safeguards, cause inappropriate prompts or evaluation criteria to be applied, and mask patient-safety or privacy risks that should receive stricter handling.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The audit explicitly approves use for broad academic writing tasks even though the stated function is converting physician dictation into structured SOAP notes. That unjustified capability expansion can be exploited to run the skill outside its validated purpose, increasing the chance of unsafe outputs, prompt misuse, and bypass of medical-documentation boundaries under the cover of a passing audit.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: Listing 'Primary routing is Academic Writing' in key strengths reinforces and operationalizes the same domain mismatch, signaling to downstream users and systems that this skill is appropriate for a broader, less regulated domain. For a medical scribe skill, that increases the likelihood of deployment under weaker controls and misinforms reviewers about the risk profile.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The code can transmit raw physician dictation to external LLM providers when the --llm option is used, despite the skill being presented as a dictation-to-SOAP conversion tool without clear disclosure of third-party processing. In a medical context this can expose highly sensitive PHI/PII to external services, creating confidentiality, compliance, and data-governance risk.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: This implementation adds network-based LLM processing for medical dictation, which materially expands the skill's data-flow and trust boundary beyond what a simple SOAP-note conversion trigger implies. Because the input is likely to contain patient identifiers and clinical details, undisclosed external processing is dangerous even if technically optional.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The validation code references note.assessment, but SOAPNote has no such attribute, so validation will raise an AttributeError when no primary diagnosis is present. This can crash processing for incomplete inputs, creating a denial-of-service condition and suppressing safety warnings exactly when the note is missing critical clinical data.

Vague Triggers

Medium

Confidence: 78% confidence
Finding: The trigger text 'Convert physician verbal dictation into structured SOAP notes. Trigger' is underspecified and lacks clear boundaries, exclusions, or negative examples. Ambiguous activation conditions increase the risk that the skill will be invoked on non-physician, non-clinical, or otherwise unvalidated inputs, which is more concerning here because the skill handles medical documentation workflows.

Vague Triggers

High

Confidence: 98% confidence
Finding: The off-domain invocation description is overly broad and effectively authorizes a medical skill for generic academic writing workflows. In this context, scope expansion is especially dangerous because it can sidestep medical boundary controls, confuse users about intended use, and provide a path for unvalidated behavior while the audit still reports the package as production ready.

Missing User Warnings

High

Confidence: 99% confidence
Finding: Patient dictation may be sent to OpenAI or Anthropic without a clear warning or consent prompt at the point of use, even though the data is likely to contain PHI. In a healthcare workflow this is especially dangerous because users may reasonably assume local processing, while external transmission can trigger privacy breaches and regulatory exposure.

Unpinned Dependencies

Low

Category: Supply Chain
Content: anthropic dataclasses openai whisper
Confidence: 96% confidence
Finding: anthropic

Unpinned Dependencies

Low

Category: Supply Chain
Content: anthropic dataclasses openai whisper
Confidence: 93% confidence
Finding: dataclasses

Unpinned Dependencies

Low

Category: Supply Chain
Content: anthropic dataclasses openai whisper
Confidence: 96% confidence
Finding: openai

Unpinned Dependencies

Low

Category: Supply Chain
Content: anthropic dataclasses openai whisper
Confidence: 95% confidence
Finding: whisper

Known Vulnerable Dependency: anthropic — 2 advisory(ies): CVE-2026-34450 (Claude SDK for Python has Insecure Default File Permissions in Local Filesystem ); CVE-2026-34452 (Claude SDK for Python: Memory Tool Path Validation Race Condition Allows Sandbox)

Low

Category: Supply Chain
Confidence: 84% confidence
Finding: anthropic

VirusTotal

No VirusTotal findings

View on VirusTotal