Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Clawtext Ingest

v1.0.1

Multi-source memory ingestion with Discord support, automatic deduplication, and agent-ready patterns

0· 365·1 current·1 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The skill is legitimately a multi-source ingestion tool (Discord, files, URLs, JSON) and the included code supports that. However, the skill metadata claims no required environment variables while the runtime docs and code clearly rely on a DISCORD_TOKEN (process.env.DISCORD_TOKEN) for Discord ingestion. Declaring zero required env vars is inconsistent with the documented/implemented functionality.
!
Instruction Scope
SKILL.md and AGENT_GUIDE direct agents to read local files, ingest JSON/URLs, call the Discord API, run CLI subprocesses (execSync / execFile), and persist hashes to disk (.ingest_hashes.json). Those actions are within the stated purpose but the SKILL.md contains detected 'unicode-control-chars' (prompt-injection signal) which could indicate hidden characters intended to manipulate automated reviewers or agent behavior. The instructions also give agents broad discretion to run subprocesses and scheduled jobs — expected for this tool but increases risk if the skill is untrusted.
Install Mechanism
There is no non-standard external installer or download URL shown; the README suggests normal npm/openclaw installation and repository references a GitHub URL. The package includes source, bins, and package.json (no installer that fetches arbitrary archives from unknown hosts). This is proportionate to a Node.js CLI/library.
!
Credentials
The skill metadata declares no required environment variables, but the docs and code require DISCORD_TOKEN for Discord ingestion. Requiring a Discord bot token is proportionate to the feature, but the omission from declared requirements is incoherent and surprising. The skill asks agents to pass environment tokens into subprocesses (e.g., execSync with DISCORD_TOKEN), so confirm token handling, storage, and that only minimum bot scopes are used.
Persistence & Privilege
always:false (normal). The skill provides autonomous patterns for agents (ingestForumAutonomous, cron jobs, CLI subprocesses) and writes local state (e.g., .ingest_hashes.json, optional outputPath). Autonomous invocation plus the ability to spawn subprocesses is expected for this tooling, but it increases the blast radius if the skill or its maintainer were untrusted—review where files are written and what is sent over the network.
Scan Findings in Context
[unicode-control-chars] unexpected: Hidden/unicode control characters were detected in SKILL.md. This is not expected for ordinary documentation and can be used for prompt injection or to manipulate automated parsing/review. Recommend opening SKILL.md in an editor that shows hidden characters and removing or inquiring about them.
What to consider before installing
This skill appears to implement what it claims (Discord + multi-source ingestion), but there are some red flags you should address before installing: 1) The docs/code require a Discord bot token (DISCORD_TOKEN) but the skill metadata lists no required env vars — ask the publisher to declare required env vars in the skill manifest. 2) SKILL.md contains hidden unicode control characters (possible prompt-injection attempt); inspect and sanitize the file before trusting automated reviews. 3) Review the source files that interact with Discord and the network (src/adapters/discord.js, bin/discord.js, src/agent-runner.js) to confirm: a) tokens are not logged or uploaded to unknown endpoints, b) attachments are handled safely (where they are saved, whether external URLs are fetched), and c) network endpoints are only Discord/GitHub/expected services. 4) Run the package in an isolated environment first (no production tokens); if you must provide a bot token, give the bot minimal read-only scopes and be ready to revoke it. 5) Prefer installing from the upstream GitHub repo referenced in the docs (verify the repo owner and commits) rather than trusting an unverified registry snapshot. If the maintainer cannot clarify the missing metadata and the unicode-control characters, treat the package as untrusted.

Like a lobster shell, security has layers — review code before you run it.

latestvk97bje0dxn8bmmr343ns0g8sqs82aps5
365downloads
0stars
2versions
Updated 36m ago
v1.0.1
MIT-0

ClawText Ingest — Production-Ready Memory Ingestion

Version: 1.3.0 | License: MIT | Status: Production ✅
Author: ragesaq | Category: Memory & Knowledge Management
GitHub: https://github.com/ragesaq/clawtext-ingest


🎯 What It Does

ClawText Ingest transforms external data (Discord forums, files, URLs, JSON, text) into structured, deduplicated memories for AI agents.

The Problem It Solves

  • Manual ingestion — Tedious, error-prone, no metadata
  • Duplicate memories — Same data ingested multiple times
  • Unstructured data — No hierarchy, no context preservation
  • One-time imports — No recurring/scheduled ingestion
  • Discord-specific gaps — Can't preserve forum post↔reply structure

The Solution

One command imports from Discord, files, URLs, or JSON
100% idempotent — Run 1000x, zero duplicates
Automatic metadata — YAML frontmatter with date, project, type, entities
6 agent patterns — Autonomous workflows documented and ready
Discord-native — Forum hierarchy preserved, progress bars, auto-batch mode


✨ Key Features

🎯 Discord Integration (New in v1.3.0)

  • Forum + Channel + Thread support
  • Hierarchy preservation — Post↔reply structure in metadata
  • Real-time progress — Live feedback for large ingestions
  • Auto-batch mode — <500 posts: full, ≥500 posts: streaming
  • One-command setup — 5-minute bot creation

📁 Multi-Source Ingestion

  • Files — Glob patterns (Markdown, text, etc.)
  • URLs — Single or bulk URL ingestion
  • JSON — Chat exports, API responses
  • Raw text — Quick knowledge capture
  • Batch operations — Unified ingestion from multiple sources

🔄 Deduplication & Safety

  • SHA1-based — Cryptographic hash matching
  • 100% idempotent — Safe for repeated runs
  • ConfigurablecheckDedupe: true/false per operation
  • Zero data loss — Failed items tracked, fallback per-item ingestion
  • Hash persistence.ingest_hashes.json for cross-session tracking

🤖 Agent-Ready

  • 6 documented patterns — Direct API, Discord Agent, CLI, Cron, Batch, Thread
  • Working code examples — Copy-paste ready
  • Real-world patterns — GitHub sync, Discord monitoring, team decisions
  • Error handling — Comprehensive error recovery
  • Progress callbacks — Track ingestion in real-time

🛠️ Developer-Friendly

  • CLI toolclawtext-ingest + clawtext-ingest-discord commands
  • Node.js API — Simple imports for programmatic use
  • TypeScript-ready — Clear method signatures
  • Extensible — Custom transforms, field mapping
  • Well-documented — 11 guides, 20+ examples

🔗 ClawText Integration

  • Automatic cluster indexing — New memories indexed after rebuild
  • RAG injection — Relevant context injected into agent prompts
  • Project routing — Organize memories by project/source
  • Entity linking — Auto-extract and link related entities

🚀 Quick Start

Installation

# Via npm
npm install clawtext-ingest

# Via OpenClaw
openclaw install clawtext-ingest

Discord Ingestion (5 minutes)

# 1. Set up Discord bot (see DISCORD_BOT_SETUP.md)
# 2. Get bot token, set DISCORD_TOKEN env var

# 3. Inspect forum
clawtext-ingest-discord describe-forum --forum-id FORUM_ID --verbose

# 4. Ingest with progress
DISCORD_TOKEN=xxx clawtext-ingest-discord fetch-discord --forum-id FORUM_ID

# 5. Rebuild ClawText clusters
clawtext-ingest rebuild

File Ingestion

clawtext-ingest ingest-files --input="docs/*.md" --project="docs"

Node.js API

import { ClawTextIngest } from 'clawtext-ingest';

const ingest = new ClawTextIngest();

// Ingest files
await ingest.fromFiles(['docs/**/*.md'], { project: 'docs', type: 'fact' });

// Ingest JSON
await ingest.fromJSON(chatArray, { project: 'team' }, {
  keyMap: { contentKey: 'message', dateKey: 'timestamp', authorKey: 'user' }
});

// Rebuild clusters for RAG injection
await ingest.rebuildClusters();

🤖 Agent Integration (6 Patterns)

Pattern 1: Direct API

For: In-agent code
Use when: Agents need to ingest as part of workflow

const ingest = new ClawTextIngest();
await ingest.fromFiles(['docs/**/*.md'], { project: 'docs' });

Pattern 2: Discord Agent

For: Autonomous Discord ingestion
Use when: Agents need to fetch Discord forums

const runner = new DiscordIngestionRunner(ingest);
await runner.ingestForumAutonomous({
  forumId, mode: 'batch', token: process.env.DISCORD_TOKEN
});

Pattern 3: CLI Subprocess

For: Agents executing commands
Use when: Simpler CLI-based execution needed

await execAsync('clawtext-ingest-discord fetch-discord --forum-id ID');

Pattern 4: Cron/Scheduled

For: Recurring tasks
Use when: Daily/hourly ingestion needed

cron.schedule('0 * * * *', () => agentIngest());

Pattern 5: Batch Multi-Source

For: Unified ingestion
Use when: Multiple sources in one operation

await ingest.ingestAll([
  { type: 'files', data: ['docs/**/*.md'], metadata: {...} },
  { type: 'json', data: chatExport, metadata: {...} }
]);

Pattern 6: Discord Thread

For: Thread-specific ingestion
Use when: Single thread fetch needed

await runner.ingestThread(threadId);

→ See AGENT_GUIDE.md for complete examples


📊 Real-World Examples

Example 1: Daily Documentation Sync

async function syncDocsDaily() {
  const ingest = new ClawTextIngest();
  const result = await ingest.ingestAll([
    { type: 'files', data: ['docs/**/*.md'], metadata: { project: 'docs' } },
    { type: 'urls', data: ['https://docs.example.com/api'], metadata: { project: 'api-docs' } }
  ]);
  await ingest.rebuildClusters();
  return result;
}

Example 2: Discord Forum Monitoring

async function monitorDiscordForum(forumId) {
  const ingest = new ClawTextIngest();
  const runner = new DiscordIngestionRunner(ingest);
  
  const result = await runner.ingestForumAutonomous({
    forumId,
    mode: 'batch',
    token: process.env.DISCORD_TOKEN,
    onProgress: (p) => console.log(`${p.percent}% complete...`)
  });
  
  return result;
}

Example 3: Team Decisions Ingestion

async function ingestTeamDecisions() {
  const ingest = new ClawTextIngest();
  
  const result = await ingest.ingestAll([
    { type: 'files', data: ['decisions/adr/**/*.md'], metadata: { type: 'adr' } },
    { type: 'json', data: slackThread, metadata: { type: 'decision', source: 'slack' } }
  ]);
  
  await ingest.rebuildClusters();
  return result;
}

🛒 CLI Commands

clawtext-ingest — File/URL/JSON/Text Ingestion

clawtext-ingest ingest-files --input="docs/*.md" --project="docs" --verbose
clawtext-ingest ingest-urls --input="https://example.com" --project="research"
clawtext-ingest ingest-json --input=messages.json --source="slack"
clawtext-ingest ingest-text --input="Finding: X is better than Y" --project="findings"
clawtext-ingest batch --config=sources.json
clawtext-ingest rebuild
clawtext-ingest status

clawtext-ingest-discord — Discord Integration

# Inspect forum
clawtext-ingest-discord describe-forum --forum-id FORUM_ID --verbose

# Fetch & ingest
DISCORD_TOKEN=xxx clawtext-ingest-discord fetch-discord \
  --forum-id FORUM_ID \
  --mode batch \
  --batch-size 100 \
  --verbose

📚 Documentation

DocumentPurposeRead Time
README.mdOverview + quick start5 min
QUICKSTART.md5-minute setup5 min
AGENT_GUIDE.md6 autonomous patterns10 min
API_REFERENCE.mdComplete API docs15 min
PHASE2_CLI_GUIDE.mdCLI commands10 min
DISCORD_BOT_SETUP.mdBot creation5 min
CLAYHUB_GUIDE.mdPublication5 min
INDEX.mdDocumentation index2 min

🎯 Who Should Use This

  • AI/Agent developers — Building knowledge-aware agents
  • RAG engineers — Populating memory for context injection
  • Teams using Discord — Leveraging Discord as knowledge base
  • DevOps/MLOps — Automated knowledge ingestion pipelines
  • Researchers — Structuring unstructured data sources

⚡ Performance

OperationSpeedNotes
Ingest 100 files~5 secWith SHA1 dedup check
Ingest 1000 JSON items~15 secBatch processing
Small forum (<100 msgs)~10 secFull mode
Large forum (1000+ msgs)~2 minAuto-batch, streaming
Rebuild clusters~5-30 secDepends on total memories

✅ Quality Metrics

MetricValue
Tests22/22 passing ✅
Code1,254 production lines
Documentation92 KB across 11 guides
Examples20+ working examples
Coverage100% critical paths

🔗 Integration with ClawText

  1. Ingest data → Creates memories with YAML metadata
  2. Rebuild clusters → ClawText indexes new memories
  3. RAG layer → Relevant context injected on next prompt
  4. Agent response — Enhanced with contextual information
# Complete workflow
clawtext-ingest-discord fetch-discord --forum-id ID  # Step 1
clawtext-ingest rebuild                               # Step 2
# Step 3-4 automatic (ClawText + Agent)

🆘 Support


📦 Installation & Requirements

Requirements:

  • Node.js ≥ 18.0.0
  • OpenClaw (for agent patterns)
  • ClawText ≥ 1.2.0 (for RAG integration)

Installation:

npm install clawtext-ingest
# or
openclaw install clawtext-ingest

Binaries:

  • clawtext-ingest — File/URL/JSON ingestion
  • clawtext-ingest-discord — Discord integration

🚀 Why This Over Alternatives

FeatureClawText-IngestManualGeneric ImporterAPI Tool
Discord native
DeduplicationPartial
Agent patterns
Metadata autoPartial
ClawText integration
IdempotentPartial

📄 License

MIT — Use freely, open source, community supported


🙌 Contributing

Contributions welcome! See GitHub issues for current priorities.


Ready to ingest? Start with QUICKSTART.md (5 min) or AGENT_GUIDE.md if you're building agents.

Comments

Loading comments...