Llamaindex
v1.0.0LlamaIndex RAG 框架助手,精通文档索引、检索增强生成、向量存储、查询引擎
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The name/description match the SKILL.md content: it teaches how to install and use LlamaIndex, vector stores, loaders, and query engines. All commands and examples relate to building RAG pipelines.
Instruction Scope
Instructions legitimately show reading local data (./data), creating a local Chroma DB (./chroma_db), and optionally loading web pages. These are expected for indexing/RAG, but they do mean the agent following these instructions could read and write local files and fetch remote pages — users should only point it at data they want indexed.
Install Mechanism
There is no install spec in the registry (instruction-only). The SKILL.md suggests pip install commands (standard PyPI packages). That is normal, but running them installs third-party packages into the environment, so prefer a virtualenv and verify package names/versions before installing.
Credentials
The skill declares no required environment variables, which is consistent. The docs recommend OpenAI-related packages (llama-index-llms-openai, embeddings-openai) — using those will require provider API keys (e.g., OPENAI_API_KEY) though the skill doesn't declare them. Users should only supply credentials for services they intend to use.
Persistence & Privilege
always:false and no install hooks; the skill does not request persistent platform privileges or override other skills. It only provides instructions the agent could follow when invoked.
Assessment
This skill is an instructional guide for using LlamaIndex and is internally consistent. Before installing or following the instructions: (1) run pip installs inside a virtualenv/container and verify package names and versions; (2) be aware the examples read local files (./data) and create a local Chroma DB (./chroma_db) — only point the agent at data you want indexed; (3) if you plan to use OpenAI (or other LLM/embedding providers), you will need to provide API keys (e.g., OPENAI_API_KEY) — do not share unrelated credentials; (4) review and understand any code/examples before executing them, especially web loaders that fetch remote pages.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
LlamaIndex RAG 框架助手
你是 LlamaIndex(原 GPT Index)领域的专家,帮助用户构建高质量的检索增强生成系统。
核心概念
| 概念 | 说明 |
|---|---|
| Document | 原始数据源(PDF、网页、数据库等)的抽象表示 |
| Node | Document 切分后的文本块,是索引的基本单元 |
| Index | 对 Node 的组织结构,支持向量、摘要、知识图谱等类型 |
| QueryEngine | 查询引擎,从 Index 中检索相关内容并生成回答 |
| Retriever | 检索器,从 Index 中获取相关 Node |
安装
pip install llama-index
pip install llama-index-llms-openai # OpenAI LLM
pip install llama-index-embeddings-openai # OpenAI Embedding
pip install llama-index-vector-stores-chroma # Chroma 向量库
快速开始:5 行代码构建 RAG
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("这份文档的主要内容是什么?")
数据加载
from llama_index.core import SimpleDirectoryReader
# 通用文件加载,支持 PDF、DOCX、TXT、CSV 等
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
required_exts=[".pdf", ".md"],
filename_as_id=True
).load_data()
# 专用 Loader(LlamaHub 生态)
from llama_index.readers.web import SimpleWebPageReader
docs = SimpleWebPageReader().load_data(["https://example.com"])
索引类型
| 索引类型 | 适用场景 | 说明 |
|---|---|---|
| VectorStoreIndex | 语义搜索(最常用) | 将 Node 转为向量,余弦相似度检索 |
| SummaryIndex | 全文摘要 | 遍历所有 Node 生成摘要 |
| TreeIndex | 层级摘要 | 自底向上构建摘要树 |
| KnowledgeGraphIndex | 知识图谱 | 提取实体关系 |
| KeywordTableIndex | 关键词检索 | 基于关键词匹配 |
向量存储集成
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection("my_docs")
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
支持的向量数据库
| 向量库 | 特点 | 适用场景 |
|---|---|---|
| Chroma | 轻量嵌入式,零配置 | 本地开发、小规模 |
| Qdrant | 高性能,丰富过滤 | 生产环境推荐 |
| Pinecone | 全托管云服务 | 免运维需求 |
| Milvus | 大规模分布式 | 亿级向量数据 |
| FAISS | Meta 出品,纯内存 | 高性能本地检索 |
查询引擎高级配置
query_engine = index.as_query_engine(
similarity_top_k=5, # 检索 Top-K 个相关片段
response_mode="compact", # compact/refine/tree_summarize
streaming=True # 流式输出
)
与 LangChain 对比
| 特性 | LlamaIndex | LangChain |
|---|---|---|
| 核心定位 | RAG 专精,数据索引和检索 | 通用 LLM 应用框架 |
| 数据处理 | 内置丰富的文档加载和切分 | 需要更多手动配置 |
| 索引能力 | 多种索引类型,开箱即用 | 依赖向量库直接集成 |
| 查询优化 | 内置 Reranker、路由、子问题分解 | 需要手动编排 Chain |
| 适用场景 | 知识库问答、文档分析 | Agent、工作流、通用应用 |
| 组合使用 | 可作为 LangChain 的 Retriever | 可集成 LlamaIndex 索引 |
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
