Security audit

OpenAI Agent SDK

Security checks across malware telemetry and agentic risk

Overview

This SDK teaching skill is mostly coherent, but one calculator example exposes arbitrary Python execution if users run or copy it as written.

Review before installing or copying examples. Do not run or reuse the calculator tool as written; replace eval with a safe arithmetic parser. Treat provider API keys, traces, conversation history, and event data as sensitive, especially in shared terminals or logs.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (5)

eval() call detected

High

Category: Dangerous Code Execution
Content: """计算数学表达式的结果""" print(f"[调试] 计算: {expression}") try: result = eval(expression) return float(result) except Exception as e: return 0.0
Confidence: 99% confidence
Finding: result = eval(expression)

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The documented `calculate` tool directly calls `eval(expression)` on agent-controlled input, which can lead to arbitrary Python code execution if copied into a real agent tool. In this skill’s context, agents are specifically designed to accept natural-language/user-driven inputs and invoke tools, so presenting `eval` as a recommended pattern materially increases the chance of dangerous misuse.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The tool is presented as a math calculator, but its implementation evaluates arbitrary Python expressions. That mismatch hides the true capability from developers and users, increasing the chance it will be exposed to untrusted input in agent workflows and abused for code execution.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: For a simple weather-and-calculation example, exposing arbitrary Python expression evaluation is unnecessary and materially expands the attack surface. In agent systems, examples are often copied into production, so this normalizes an unsafe pattern that can turn ordinary user prompts into a code-execution primitive.

Ssd 3

Medium

Confidence: 91% confidence
Finding: The debugging examples encourage printing full conversation history and raw event data, which may include prompts, secrets, personal data, tool arguments, or model outputs. In an agent framework, these logs often aggregate highly sensitive multi-step context, so indiscriminate console/log output can cause unintended disclosure to operators, shared terminals, log pipelines, or observability backends.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dynamic_code_execution

Dynamic code execution detected.

Critical

Code: suspicious.dynamic_code_execution
Location: examples/tools_example.py:30