Medical Scribe (Dictation)

Security checks across malware telemetry and agentic risk

Overview

The skill is purpose-aligned overall, but it needs Review because patient dictation can be processed through external LLM APIs while the documentation understates network and clinical-safety boundaries.

Install only in an isolated environment, pin and review dependencies, and do not use the OpenAI or Anthropic option with patient data unless your organization has approved the provider, privacy terms, and PHI handling. Treat generated diagnoses, differentials, and reasoning as draft documentation requiring clinician review.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Findings (17)

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill documentation indicates file read/write capabilities and a packaged executable path, but it does not declare permissions accordingly. This creates a governance gap where reviewers or runtime controls may underestimate what the skill can access, increasing the chance of unintended file access or unsafe deployment assumptions.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The declared description frames the skill as simple dictation-to-SOAP conversion, but the documented behavior includes audio transcription, local file I/O, and possible external API use with OpenAI or Anthropic. This mismatch is dangerous because operators may approve or run the skill without understanding that sensitive medical data could be transmitted externally or written to disk.

Intent-Code Divergence

Medium
Confidence
83% confidence
Finding
The safety notes claim the tool does not provide diagnostic suggestions, yet the documented SOAP output includes primary diagnosis and differential diagnoses. In a medical context, this contradiction can cause unsafe reliance on generated clinical assessments and weakens user understanding of the tool's actual clinical influence.

Intent-Code Divergence

Medium
Confidence
95% confidence
Finding
The audit metadata classifies a medical SOAP-note skill as 'Academic Writing,' creating a material scope mismatch in the control plane that governs how the skill is reviewed and routed. In a medical context, misclassification can weaken domain-specific safeguards, cause inappropriate prompts or evaluation criteria to be applied, and mask patient-safety or privacy risks that should receive stricter handling.

Context-Inappropriate Capability

Medium
Confidence
97% confidence
Finding
The audit explicitly approves use for broad academic writing tasks even though the stated function is converting physician dictation into structured SOAP notes. That unjustified capability expansion can be exploited to run the skill outside its validated purpose, increasing the chance of unsafe outputs, prompt misuse, and bypass of medical-documentation boundaries under the cover of a passing audit.

Intent-Code Divergence

Medium
Confidence
94% confidence
Finding
Listing 'Primary routing is Academic Writing' in key strengths reinforces and operationalizes the same domain mismatch, signaling to downstream users and systems that this skill is appropriate for a broader, less regulated domain. For a medical scribe skill, that increases the likelihood of deployment under weaker controls and misinforms reviewers about the risk profile.

Description-Behavior Mismatch

High
Confidence
98% confidence
Finding
The code can transmit raw physician dictation to external LLM providers when the --llm option is used, despite the skill being presented as a dictation-to-SOAP conversion tool without clear disclosure of third-party processing. In a medical context this can expose highly sensitive PHI/PII to external services, creating confidentiality, compliance, and data-governance risk.

Context-Inappropriate Capability

High
Confidence
96% confidence
Finding
This implementation adds network-based LLM processing for medical dictation, which materially expands the skill's data-flow and trust boundary beyond what a simple SOAP-note conversion trigger implies. Because the input is likely to contain patient identifiers and clinical details, undisclosed external processing is dangerous even if technically optional.

Intent-Code Divergence

Medium
Confidence
99% confidence
Finding
The validation code references note.assessment, but SOAPNote has no such attribute, so validation will raise an AttributeError when no primary diagnosis is present. This can crash processing for incomplete inputs, creating a denial-of-service condition and suppressing safety warnings exactly when the note is missing critical clinical data.

Vague Triggers

Medium
Confidence
78% confidence
Finding
The trigger text 'Convert physician verbal dictation into structured SOAP notes. Trigger' is underspecified and lacks clear boundaries, exclusions, or negative examples. Ambiguous activation conditions increase the risk that the skill will be invoked on non-physician, non-clinical, or otherwise unvalidated inputs, which is more concerning here because the skill handles medical documentation workflows.

Vague Triggers

High
Confidence
98% confidence
Finding
The off-domain invocation description is overly broad and effectively authorizes a medical skill for generic academic writing workflows. In this context, scope expansion is especially dangerous because it can sidestep medical boundary controls, confuse users about intended use, and provide a path for unvalidated behavior while the audit still reports the package as production ready.

Missing User Warnings

High
Confidence
99% confidence
Finding
Patient dictation may be sent to OpenAI or Anthropic without a clear warning or consent prompt at the point of use, even though the data is likely to contain PHI. In a healthcare workflow this is especially dangerous because users may reasonably assume local processing, while external transmission can trigger privacy breaches and regulatory exposure.

Unpinned Dependencies

Low
Category
Supply Chain
Content
anthropic
dataclasses
openai
whisper
Confidence
96% confidence
Finding
anthropic

Unpinned Dependencies

Low
Category
Supply Chain
Content
anthropic
dataclasses
openai
whisper
Confidence
93% confidence
Finding
dataclasses

Unpinned Dependencies

Low
Category
Supply Chain
Content
anthropic
dataclasses
openai
whisper
Confidence
96% confidence
Finding
openai

Unpinned Dependencies

Low
Category
Supply Chain
Content
anthropic
dataclasses
openai
whisper
Confidence
95% confidence
Finding
whisper

Known Vulnerable Dependency: anthropic — 2 advisory(ies): CVE-2026-34450 (Claude SDK for Python has Insecure Default File Permissions in Local Filesystem ); CVE-2026-34452 (Claude SDK for Python: Memory Tool Path Validation Race Condition Allows Sandbox)

Low
Category
Supply Chain
Confidence
84% confidence
Finding
anthropic

VirusTotal

No VirusTotal findings

View on VirusTotal