Back to skill

Security audit

OpenAI Agent SDK

Security checks across malware telemetry and agentic risk

Overview

This SDK teaching skill is mostly coherent, but one calculator example exposes arbitrary Python execution if users run or copy it as written.

Review before installing or copying examples. Do not run or reuse the calculator tool as written; replace eval with a safe arithmetic parser. Treat provider API keys, traces, conversation history, and event data as sensitive, especially in shared terminals or logs.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Findings (5)

eval() call detected

High
Category
Dangerous Code Execution
Content
"""计算数学表达式的结果"""
    print(f"[调试] 计算: {expression}")
    try:
        result = eval(expression)
        return float(result)
    except Exception as e:
        return 0.0
Confidence
99% confidence
Finding
result = eval(expression)

Context-Inappropriate Capability

Medium
Confidence
98% confidence
Finding
The documented `calculate` tool directly calls `eval(expression)` on agent-controlled input, which can lead to arbitrary Python code execution if copied into a real agent tool. In this skill’s context, agents are specifically designed to accept natural-language/user-driven inputs and invoke tools, so presenting `eval` as a recommended pattern materially increases the chance of dangerous misuse.

Intent-Code Divergence

High
Confidence
98% confidence
Finding
The tool is presented as a math calculator, but its implementation evaluates arbitrary Python expressions. That mismatch hides the true capability from developers and users, increasing the chance it will be exposed to untrusted input in agent workflows and abused for code execution.

Context-Inappropriate Capability

High
Confidence
96% confidence
Finding
For a simple weather-and-calculation example, exposing arbitrary Python expression evaluation is unnecessary and materially expands the attack surface. In agent systems, examples are often copied into production, so this normalizes an unsafe pattern that can turn ordinary user prompts into a code-execution primitive.

Ssd 3

Medium
Confidence
91% confidence
Finding
The debugging examples encourage printing full conversation history and raw event data, which may include prompts, secrets, personal data, tool arguments, or model outputs. In an agent framework, these logs often aggregate highly sensitive multi-step context, so indiscriminate console/log output can cause unintended disclosure to operators, shared terminals, log pipelines, or observability backends.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dynamic_code_execution

Dynamic code execution detected.

Critical
Code
suspicious.dynamic_code_execution
Location
examples/tools_example.py:30