RAGFlow Skill

Data & APIs

Operate RAGFlow v0.25.6 deployments through the bundled Node CLI and API client. Use when user needs to manage RAGFlow datasets, documents, uploads, parsing, chunks, retrieval, chat assistants, chat sessions, agents, agent sessions, embedded website access, metadata filters, model discovery, system settings, or API diagnostics. Also use when the user asks about knowledge bases, document chunking, vector retrieval, embed code, or RAG workflows and the current context explicitly involves a RAGFlow server or deployment.

Install

openclaw skills install skill-for-ragflow

RAGFlow Skill

Use this skill to operate RAGFlow through scripts/ragflow.js. The CLI wraps the full v0.25.6 REST API - every action goes through node {baseDir}/scripts/ragflow.js <command> [options]. Prefer --json on any command when the output will be parsed or chained into another step.

Requirements

  • Set RAGFLOW_URL and RAGFLOW_API_KEY in the environment or this skill's .env.
  • Use Node.js to run bundled scripts.
  • Tune chunk deletion retries only when needed with RAGFLOW_DELETE_CHUNK_RETRIES and RAGFLOW_DELETE_CHUNK_RETRY_DELAY_MS.
  • Tune the chunk deletion diagnostic script only when needed with RAGFLOW_REPRO_TIMEOUT_MS, RAGFLOW_REPRO_DELETE_RETRIES, RAGFLOW_REPRO_DELETE_RETRY_DELAY_MS, and RAGFLOW_REPRO_EMBEDDING_MODEL.

Security Notes

  • Use HTTPS in production. Production deployments should use https:// for RAGFLOW_URL to protect the API key in transit. Local development (http://localhost) is acceptable for testing.
  • Use least-privilege API keys. Consider creating dedicated API keys with minimal permissions for specific workflows rather than using admin-level keys.
  • Protect your API key. Never share RAGFLOW_API_KEY in chat messages or commit it to version control. Use environment variables or the skill's .env file.

Quick Command Reference

ScenarioCommands
Knowledge base setupcreate-dataset, list-datasets, get-dataset, update-dataset, delete-datasets
Document ingestionupload-documents, list-documents, get-document, update-document, delete-documents, download-document, preview-document, metadata-summary
Parsing & chunkingstart-parsing, stop-parsing, wait-parsing, list-chunks, add-chunk, update-chunk, delete-chunks
Direct retrievalretrieve
Chat assistantcreate-chat, list-chats, get-chat, update-chat, patch-chat, delete-chats
Chat sessionscreate-session, list-sessions, delete-sessions, chat, chat-session
Agentcreate-agent, list-agents, get-agent, update-agent, delete-agents
Agent Tagslist-agent-tags, update-agent-tags
Agent sessionscreate-agent-session, list-agent-sessions, delete-agent-sessions, agent-chat
Connectorlist-connectors, create-connector, get-connector, update-connector, delete-connector
RAPTORrun-raptor, trace-raptor
Embedded website accesslist-system-tokens, create-system-token, delete-system-token, embed-code, embed-info, embed-chat, embed-agent-chat
Model discoverylist-models
Systemsystem-version, get-log-levels, set-log-level

Common Workflows

Full RAG pipeline (upload -> parse -> retrieve)

  1. create-dataset --name "My KB" --chunk-method naive
  2. upload-documents --dataset <id> --files ./doc1.pdf ./doc2.txt
  3. start-parsing --dataset <id> --doc-ids <doc_id1> <doc_id2>
  4. wait-parsing --dataset <id> --doc-ids <doc_id1> <doc_id2>
  5. retrieve --question "What is X?" --datasets <id>

Chat assistant with sessions

  1. create-chat --name "Q&A" --datasets <id> --llm-id qwen-turbo@Tongyi-Qianwen
  2. create-session --chat <chat_id>
  3. chat-session --chat <chat_id> --session <session_id> --question "Hello"

Agent workflow

  1. create-agent --title "Assistant" --dsl @agent_dsl.json
  2. create-agent-session --agent <agent_id>
  3. agent-chat --agent <agent_id> --session <session_id> --question "Hello"

Agent tags workflow

  1. list-agent-tags --agent <agent_id>
  2. update-agent-tags --agent <agent_id> --tags "Tag1,Tag2"

Connector workflow

  1. create-connector --name "GitHub" --type github --token <token>
  2. list-connectors
  3. get-connector --id <id>

RAPTOR workflow

  1. run-raptor --dataset <id> --method raptor
  2. trace-raptor --id <id> agent-chat is streaming by default. Use --stream false when you need the final JSON result in one response.

Embedded website access

  1. embed-code --chat <chat_id> --type fullscreen or embed-code --agent <agent_id> --type widget
  2. embed-info --chat <chat_id> or embed-info --agent <agent_id>
  3. embed-chat --chat <chat_id> --question "Hello" or embed-agent-chat --agent <agent_id> --question "Hello"

embed-chat automatically creates the embedded chatbot session when --session is omitted. RAGFlow's shared-site route only creates a session and returns the prologue on the first no-session request, so the CLI bootstraps session_id first and then sends the real question.

Workflow Decision Guide

The first step in any RAGFlow operation is resolving the target resource ID. After that, choose the right path:

  1. Authoring or debugging a custom agent DSL? -> Read references/AGENT_GUIDE.md - it is a self-contained guide to the current RAGFlow agent DSL schema and includes minimal examples.
  2. Need CLI syntax or option details? -> Read references/COMMANDS.md - it's organized by workflow scenario with full option tables.
  3. Editing client code or checking request/response shapes? -> Read references/API.md - it has code examples for every RagflowClient method.
  4. A command failed? -> Read references/TROUBLESHOOTING.md - common errors with causes and fixes.
  5. Formatting output for the user? -> Read references/REFERENCE.md - consistent response templates and status labels.

Key Constraints

  • Destructive deletes need confirmation. RAGFlow deletes are immediate and irreversible. Confirm before running delete-datasets, delete-documents, delete-chunks, delete-chats, delete-sessions, or delete-agents - unless the resource is a temporary artifact you created in the same workflow and the user asked you to clean up.
  • Upload and parsing are separate steps. RAGFlow does not auto-parse on upload because different documents may need different chunk methods. Upload first, adjust config if needed, then start parsing explicitly.
  • Preserve user-uploaded filenames. RAGFlow stores the multipart filename as the document name. If a user attachment is materialized as a task ID or temporary path, pass the original filename inline: upload-documents --files <original-name>=<path>.
  • Use v0.25.6 route shapes from the references. The reference docs match the current skill.
  • Tenant model identifiers use the model@provider format. When creating datasets with --embedding-model or chat assistants with --llm-id, the server expects the full identifier, for example text-embedding-v4@Tongyi-Qianwen or qwen-turbo@Tongyi-Qianwen, not a numeric model row ID. Use list-models to discover model names and providers.
  • Chat sessions use the v0.25.6 route. chat-session posts to /api/v1/chat/completions with chat_id and session_id in the body.
  • Chat session history behavior changed in v0.25.6. By default, POST /api/v1/chat/completions now appends only the latest message to stored history. Use --pass-all-history or set pass_all_history_messages: true in the API payload to replace the entire history. conversation_id is accepted as an alias for session_id.
  • Embedded access uses beta tokens and embedded sessions. embed-code, embed-info, embed-chat, and embed-agent-chat use the shared-site /api/v1/chatbots/* or /api/v1/agentbots/* routes. If --beta is not supplied, the CLI reuses the first /api/v1/system/tokens item with beta or creates one. For chatbot completions, the CLI auto-bootstraps session_id unless --session is supplied.
  • Treat embed auth material as sensitive output. System tokens, beta values, and embed URLs or iframe HTML containing auth= are operational secrets. Use them when needed for the task, but do not print the full values back to the user unless the user explicitly asks for them.
  • Embed URL generation assumes a public RAGFlow origin. embed-code uses --origin when supplied; otherwise it falls back to RAGFLOW_URL. When the API base URL and the public web origin differ, pass --origin explicitly so the generated iframe points at the actual shared-site page.
  • Prefer the current Agent DSL schema from AGENT_GUIDE.md. In practice, hand-authored agents should include components, history, path, retrieval, variables, globals, and graph, plus graph.nodes[].data.name for every component-backed node.
  • Agent tags must be comma-separated strings. When updating agent tags, pass them as a single string of comma-separated values.
  • Connectors require valid auth tokens. Ensure the target service token is valid before creating a connector.
  • Agent chat uses the v0.25.6 route. agent-chat posts to /api/v1/agents/chat/completions with agent_id in the body.
  • Iteration agents should iterate over a real list output. When an upstream Agent produces loop items, prefer an object-shaped structured output such as {"items":[...]} and point Iteration.params.items_ref at agent:0@structured.items. Start from references/examples/agents/04-iteration-agent.json.
  • Chunk deletion may need retries. Some servers can return rm_chunk deleted chunks 0, expect N due to document-store refresh lag even when the chunk exists. The CLI handles this automatically - it retries after confirming the chunk is still visible via exact ID lookup. If retries still fail, run scripts/repro-delete-chunks.js for a clean diagnosis.

Output Format

When presenting results to the user, follow the templates in references/REFERENCE.md. Key conventions:

  • Use a two-layer output model. For execution, chaining, and parsing, prefer the CLI's raw --json output. For the final user-facing response, convert that raw result into a concise summary that follows the reference templates instead of pasting the CLI payload verbatim.
  • 3+ items with attributes -> Table, abbreviating long IDs
  • Sequential steps -> Numbered list
  • Parsing status -> Use labels: UNSTART, RUNNING, CANCEL, DONE, FAIL
  • Search results -> Table with similarity scores, content as quote blocks
  • Embed/token operations -> Summarize what was generated or fetched; redact token, beta, and any auth= query value unless the user explicitly asks for the secret
  • Errors -> Show code and human-readable message

For embed and token-related commands, apply these response rules:

  1. Use the CLI result internally, but do not mirror the raw JSON back to the user by default.
  2. Lead with the operational outcome: what resource was targeted, what mode was used, whether a token was reused or created, and whether a session was created or reused.
  3. Only include the minimum secret material needed to complete the user's request. If the user did not explicitly ask for the value, redact it.
  4. If the user needs copy-paste embed material, provide it only when explicitly requested and call out that it contains sensitive auth data.