rag-ingest

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This skill mostly matches its RAG-ingestion purpose, but its embedding-key fallback can send an OpenAI API key to VectorEngine by default, so its credential configuration needs review.

Before using this skill, explicitly configure the embedding provider and key, avoid relying on OPENAI_API_KEY unless the endpoint is OpenAI, and only ingest text that may be sent to that provider and stored persistently in Qdrant.

Static analysis

Env credential access

Critical
Finding
Environment variable access combined with network send.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

An OpenAI API key present in the environment could be exposed to a non-OpenAI embedding endpoint along with document content.

Why it was flagged

If no provider-specific embedding key is set, the script may reuse OPENAI_API_KEY while the default destination remains api.vectorengine.ai, sending a potentially unrelated credential to that provider.

Skill content
const EMBED_BASE_URL = process.env.EMBED_BASE_URL || "https://api.vectorengine.ai/v1"; ... process.env.OPENAI_API_KEY; ... Authorization: `Bearer ${EMBED_API_KEY}`
Recommendation

Use a dedicated EMBED_API_KEY or VECTORENGINE_API_KEY, explicitly set EMBED_BASE_URL to the matching provider, and avoid relying on the OPENAI_API_KEY fallback unless the endpoint is OpenAI.

What this means

Using a reused or incorrect doc_id can replace or remove existing entries in the selected Qdrant collection.

Why it was flagged

The script mutates Qdrant by deleting existing points with the same doc_id and then upserting new points. This is consistent with the documented overwrite behavior, but it can change stored knowledge-base data.

Skill content
await qdrantRequest(`/collections/${collectionName}/points/delete`, { filter: { must: [{ key: "doc_id", match: { value: docId } }] } }); ... await upsertPoints(points);
Recommendation

Use unique doc IDs, verify the target collection, and keep backups or a deletion/recovery process for important knowledge-base content.

What this means

Any content ingested may be visible to the embedding service configured by EMBED_BASE_URL.

Why it was flagged

The document chunks are sent to the configured embedding provider. This is expected for an embedding workflow, but it is still an external data transfer.

Skill content
const resp = await fetch(`${EMBED_BASE_URL}/embeddings`, { ... body: JSON.stringify({ model: EMBEDDING_MODEL, input: texts }) });
Recommendation

Only ingest content that is allowed to be sent to the chosen embedding provider, and confirm that provider’s retention and privacy terms.

What this means

Sensitive or incorrect text may persist in the vector database and later influence retrieval-augmented answers.

Why it was flagged

The script stores the original text chunks and metadata in Qdrant for later retrieval, making the content part of a persistent knowledge base.

Skill content
payload: { doc_id: docId, text_type: "summary", chunk_index: idx, source: source || null, topic_tags: tagsArr, text: chunks[idx], created_at: now, updated_at: now }
Recommendation

Ingest only vetted content, namespace collections or doc IDs clearly, and maintain a process to delete or correct outdated entries.