Amazing PsyCoder

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate psychology experiment coding skill, but it needs review because its bundled examples and references include unsafe code patterns and privacy-heavy experiment templates.

Install only if you are comfortable reviewing generated experiment code before running it or using it with participants. Pay particular attention to condition-file handling, avoid exec/globals-style variable injection, remove third-party tracking/geolocation snippets unless explicitly approved, and make sure any participant metadata collection, deception, and debriefing are covered by your study protocol.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (120)

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisTrial.rgb) if thisTrial != None: for paramName in thisTrial: exec('{} = thisTrial[paramName]'.format(paramName)) for thisTrial in trials: currentLoop = trials
Confidence: 98% confidence
Finding: exec('{} = thisTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisShowMeTrial.rgb) if thisShowMeTrial != None: for paramName in thisShowMeTrial: exec('{} = thisShowMeTrial[paramName]'.format(paramName)) for thisShowMeTrial in showMeTrials: currentLoop = showMeTrials
Confidence: 98% confidence
Finding: exec('{} = thisShowMeTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisTrial.rgb) if thisTrial != None: for paramName in thisTrial: exec('{} = thisTrial[paramName]'.format(paramName)) # ------Prepare to start Routine "trial"------- t = 0
Confidence: 98% confidence
Finding: exec('{} = thisTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisShowMeTrial.rgb) if thisShowMeTrial != None: for paramName in thisShowMeTrial: exec('{} = thisShowMeTrial[paramName]'.format(paramName)) # ------Prepare to start Routine "showMeHow"------- t = 0
Confidence: 98% confidence
Finding: exec('{} = thisShowMeTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisPracticeTrial.rgb) if thisPracticeTrial != None: for paramName in thisPracticeTrial: exec('{} = thisPracticeTrial[paramName]'.format(paramName)) for thisPracticeTrial in practiceTrials: currentLoop = practiceTrials
Confidence: 96% confidence
Finding: exec('{} = thisPracticeTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisMainTrial.rgb) if thisMainTrial != None: for paramName in thisMainTrial: exec('{} = thisMainTrial[paramName]'.format(paramName)) for thisMainTrial in mainTrials: currentLoop = mainTrials
Confidence: 96% confidence
Finding: exec('{} = thisMainTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisPracticeTrial.rgb) if thisPracticeTrial != None: for paramName in thisPracticeTrial: exec('{} = thisPracticeTrial[paramName]'.format(paramName)) # ------Prepare to start Routine "main"------- t = 0
Confidence: 96% confidence
Finding: exec('{} = thisPracticeTrial[paramName]'.format(paramName))

exec() call detected

High

Category: Dangerous Code Execution
Content: # abbreviate parameter names if possible (e.g. rgb = thisMainTrial.rgb) if thisMainTrial != None: for paramName in thisMainTrial: exec('{} = thisMainTrial[paramName]'.format(paramName)) # ------Prepare to start Routine "main"------- t = 0
Confidence: 96% confidence
Finding: exec('{} = thisMainTrial[paramName]'.format(paramName))

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code collects and stores participant/session metadata, OS/platform details, frame rate, and all URL parameters via util.addInfoFromUrl(expInfo), then uses them in experiment data output. In an experiment context some metadata collection is expected, but indiscriminately ingesting URL parameters can capture tokens, identifiers, or other sensitive values without clear minimization or notice, creating an unnecessary privacy and data-exposure risk.

Context-Inappropriate Capability

Medium

Confidence: 85% confidence
Finding: The experiment fetches a remote image resource from https://pavlovia.org/assets/default/default.png, which introduces external network dependency and passive data disclosure such as participant IP address, user agent, referrer context, and request timing to a third party. While this is not inherently malicious, loading remote assets in a local skill broadens the trust boundary and can enable tracking or content integrity issues if the remote asset changes.

Intent-Code Divergence

Low

Confidence: 84% confidence
Finding: The embedded PsychoJS example calls util.addInfoFromUrl(expInfo), which imports URL query parameters into experiment metadata and may persist them with participant data. This can unintentionally capture identifiers, recruitment tokens, or other sensitive parameters beyond what the task description implies, creating a privacy and data-minimization issue even though it is not an exploit-oriented behavior.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The instructions tell participants that the other players will choose who to toss to, while the experiment logic actually uses pre-programmed condition files to determine those throws. In a deception-based psychology paradigm this may be methodologically intentional, but it is still a real integrity/ethics issue because participants are misled about agent autonomy and experimental conditions without clear debriefing or consent language.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The template explicitly states that participant metadata including IP address, city, and user agent are collected, even though these data are not needed to run or score an IAT. Collecting unnecessary identifying data increases privacy risk, re-identification potential, and regulatory exposure, especially in a psychology experiment context where participants may expect minimal data collection.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The code relies on an externally loaded script to populate `returnCitySN`, which exposes participant network metadata and adds a third-party dependency to the experiment page. This creates unnecessary privacy leakage to the external provider and introduces supply-chain risk because a compromised or changed remote script could affect all experiment sessions.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The script uses exec() to assign variables from spreadsheet-driven condition data, so any attacker who can modify RTtimeConditions.xlsx can inject arbitrary Python statements into the experiment process. In a local lab or research setting this could lead to arbitrary code execution under the privileges of the user running the experiment, which is unnecessary for loading trial parameters.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The same exec()-based population pattern appears in the main experiment loop, again turning external condition data into executable code. Because the conditions file is an external input and the experiment's purpose does not require code generation, this creates an avoidable arbitrary code execution path.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The finding correctly identifies a real code-execution issue: experiment condition data from conditions.xlsx is treated as Python source via exec(). This violates the expected boundary between passive data files and executable code, allowing a tampered spreadsheet to run arbitrary commands on the host system.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The code uses exec() on spreadsheet-derived field names, which lets untrusted condition-file content create or overwrite arbitrary variables at runtime. In this experiment context, that is an unnecessary dynamic-evaluation primitive and could be abused to alter control flow, clobber important objects, or potentially reach code execution depending on accessible names and downstream usage.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The localisation task is supposed to measure whether the participant can identify which square changed, but the routine explicitly displays `changed_color_index` in `textbox_2`, revealing the correct answer during the response phase. This undermines experiment integrity, enables trivially correct responses, and can corrupt collected behavioral data or training/assessment outcomes derived from it.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The end screen tells the participant to press space to end, but the routine never creates or checks any keyboard component for that key. As written, the experiment can hang indefinitely at the final screen unless the operator presses Escape or externally terminates it, creating a denial-of-service/usability failure that can interrupt study completion and leave data handling in an inconsistent operational state.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The script unconditionally executes a separate local Python file via execfile("calcIATs.py") whenever reprocess is enabled. This is dangerous because execfile runs arbitrary Python code in the current interpreter context, so if calcIATs.py is modified, replaced, or sourced from an untrusted directory, an attacker can achieve arbitrary code execution with the user's privileges.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The README explicitly recommends `globals()[paramName] = thisTrial[paramName]` for condition injection, even though `paramName` comes from externally supplied condition-file columns. That allows untrusted input to overwrite arbitrary global names, which can corrupt control flow, shadow functions/modules/variables, and create hard-to-audit unsafe behavior in generated experiment code. In this code-generation context, the danger is increased because the documentation normalizes the pattern and may cause it to be emitted broadly across generated scripts.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The localization task is supposed to test whether participants can identify which square changed, but the code sets `textbox_2` to `changed_color_index` and includes it in the displayed components, revealing the correct answer on screen. This undermines task integrity, invalidates collected data, and could bias or completely defeat the experimental measure.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The end screen tells participants to press space to exit, but the routine never creates or polls a keyboard component for space, so the experiment can hang indefinitely unless Escape or external termination is used. In a lab setting this can disrupt sessions, confuse participants, and risk incomplete saves or operator-forced shutdowns that may affect data integrity.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The code checks participant responses against the literal string 'corrAns' instead of the corrAns variable, so correctness is mis-scored during the trial routine. In this experimental context, that corrupts adaptive staircase updates, reversal tracking, and threshold estimation, producing invalid scientific data and misleading participant feedback.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal