Vllm

PassAudited by ClawScan on May 1, 2026.

Overview

This is a documentation-only vLLM helper; its setup examples are purpose-aligned, but users should verify external packages/images and secure any API server they start.

This skill appears safe as an instruction-only vLLM reference. Before copying its commands, verify the vLLM package or Docker image, pin versions when possible, and secure any server you expose on port 8000, especially in production.

Findings (2)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Running unpinned packages or container images can pull a newer or different version than expected.

Why it was flagged

The skill recommends installing/running external vLLM software and a Docker image tagged latest. This is expected for a deployment helper, but the exact package/image version is not pinned in the example.

Skill content

pip install vllm  # 需要 CUDA 12.1+ ... docker run ... vllm/vllm-openai:latest

Recommendation

Use trusted sources, pin package versions or Docker digests where possible, and review the official vLLM installation guidance before running the commands.

Low

#ASI02: Tool Misuse and Exploitation

What this means

If run as written, the container can access the mounted cache and the inference server may be reachable on the published port depending on host/network settings.

Why it was flagged

The documented Docker command gives the container GPU access, mounts the local Hugging Face cache, and exposes port 8000. These are normal for a vLLM server, but they expand local resource and network exposure.

Skill content

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 vllm/vllm-openai:latest

Recommendation

Run the server only intentionally, restrict network exposure with firewall or bind settings, consider authentication for production use, and mount only the cache/data that the container actually needs.