{"skill":{"slug":"rag-construction","displayName":"Rag Construction","summary":"Build RAG systems for construction knowledge bases. Create searchable AI-powered construction document systems","description":"---\r\nname: \"rag-construction\"\r\ndescription: \"Build RAG systems for construction knowledge bases. Create searchable AI-powered construction document systems\"\r\nhomepage: \"https://datadrivenconstruction.io\"\r\nmetadata: {\"openclaw\": {\"emoji\": \"🐼\", \"os\": [\"darwin\", \"linux\", \"win32\"], \"homepage\": \"https://datadrivenconstruction.io\", \"requires\": {\"bins\": [\"python3\"]}}}\r\n---\r\n# RAG Construction\r\n\r\n## Overview\r\n\r\nBased on DDC methodology (Chapter 2.3), this skill builds Retrieval-Augmented Generation (RAG) systems for construction knowledge bases, enabling semantic search and AI-powered question answering over construction documents.\r\n\r\n**Book Reference:** \"Pandas DataFrame и LLM ChatGPT\" / \"Pandas DataFrame and LLM ChatGPT\"\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom dataclasses import dataclass, field\r\nfrom enum import Enum\r\nfrom typing import List, Dict, Optional, Any, Callable\r\nfrom datetime import datetime\r\nimport json\r\nimport hashlib\r\nimport re\r\n\r\nclass DocumentType(Enum):\r\n    \"\"\"Types of construction documents\"\"\"\r\n    SPECIFICATION = \"specification\"\r\n    DRAWING = \"drawing\"\r\n    CONTRACT = \"contract\"\r\n    RFI = \"rfi\"\r\n    SUBMITTAL = \"submittal\"\r\n    CHANGE_ORDER = \"change_order\"\r\n    MEETING_MINUTES = \"meeting_minutes\"\r\n    DAILY_REPORT = \"daily_report\"\r\n    SAFETY_REPORT = \"safety_report\"\r\n    INSPECTION = \"inspection\"\r\n    MANUAL = \"manual\"\r\n    STANDARD = \"standard\"\r\n\r\nclass ChunkingStrategy(Enum):\r\n    \"\"\"Text chunking strategies\"\"\"\r\n    FIXED_SIZE = \"fixed_size\"\r\n    PARAGRAPH = \"paragraph\"\r\n    SECTION = \"section\"\r\n    SEMANTIC = \"semantic\"\r\n    SENTENCE = \"sentence\"\r\n\r\n@dataclass\r\nclass DocumentChunk:\r\n    \"\"\"A chunk of document text\"\"\"\r\n    id: str\r\n    document_id: str\r\n    content: str\r\n    metadata: Dict[str, Any]\r\n    embedding: Optional[List[float]] = None\r\n    token_count: int = 0\r\n    position: int = 0\r\n\r\n@dataclass\r\nclass Document:\r\n    \"\"\"Construction document\"\"\"\r\n    id: str\r\n    title: str\r\n    doc_type: DocumentType\r\n    content: str\r\n    source: str\r\n    metadata: Dict[str, Any] = field(default_factory=dict)\r\n    chunks: List[DocumentChunk] = field(default_factory=list)\r\n    created_at: datetime = field(default_factory=datetime.now)\r\n\r\n@dataclass\r\nclass SearchResult:\r\n    \"\"\"Search result from vector store\"\"\"\r\n    chunk: DocumentChunk\r\n    score: float\r\n    document_title: str\r\n    doc_type: DocumentType\r\n\r\n@dataclass\r\nclass RAGResponse:\r\n    \"\"\"Response from RAG system\"\"\"\r\n    query: str\r\n    answer: str\r\n    sources: List[SearchResult]\r\n    confidence: float\r\n    tokens_used: int\r\n\r\n\r\nclass TextChunker:\r\n    \"\"\"Split documents into chunks for embedding\"\"\"\r\n\r\n    def __init__(\r\n        self,\r\n        strategy: ChunkingStrategy = ChunkingStrategy.PARAGRAPH,\r\n        chunk_size: int = 500,\r\n        chunk_overlap: int = 50\r\n    ):\r\n        self.strategy = strategy\r\n        self.chunk_size = chunk_size\r\n        self.chunk_overlap = chunk_overlap\r\n\r\n    def chunk_document(self, document: Document) -> List[DocumentChunk]:\r\n        \"\"\"Split document into chunks\"\"\"\r\n        if self.strategy == ChunkingStrategy.FIXED_SIZE:\r\n            return self._chunk_fixed_size(document)\r\n        elif self.strategy == ChunkingStrategy.PARAGRAPH:\r\n            return self._chunk_by_paragraph(document)\r\n        elif self.strategy == ChunkingStrategy.SECTION:\r\n            return self._chunk_by_section(document)\r\n        elif self.strategy == ChunkingStrategy.SENTENCE:\r\n            return self._chunk_by_sentence(document)\r\n        else:\r\n            return self._chunk_fixed_size(document)\r\n\r\n    def _chunk_fixed_size(self, document: Document) -> List[DocumentChunk]:\r\n        \"\"\"Chunk by fixed character size with overlap\"\"\"\r\n        chunks = []\r\n        text = document.content\r\n        start = 0\r\n        position = 0\r\n\r\n        while start < len(text):\r\n            end = start + self.chunk_size\r\n\r\n            # Find word boundary\r\n            if end < len(text):\r\n                while end > start and text[end] not in ' \\n\\t':\r\n                    end -= 1\r\n\r\n            chunk_text = text[start:end].strip()\r\n            if chunk_text:\r\n                chunk_id = self._generate_chunk_id(document.id, position)\r\n                chunks.append(DocumentChunk(\r\n                    id=chunk_id,\r\n                    document_id=document.id,\r\n                    content=chunk_text,\r\n                    metadata={\r\n                        \"doc_type\": document.doc_type.value,\r\n                        \"title\": document.title,\r\n                        **document.metadata\r\n                    },\r\n                    token_count=len(chunk_text.split()),\r\n                    position=position\r\n                ))\r\n                position += 1\r\n\r\n            start = end - self.chunk_overlap\r\n            if start >= len(text):\r\n                break\r\n\r\n        return chunks\r\n\r\n    def _chunk_by_paragraph(self, document: Document) -> List[DocumentChunk]:\r\n        \"\"\"Chunk by paragraphs\"\"\"\r\n        chunks = []\r\n        paragraphs = document.content.split('\\n\\n')\r\n        current_chunk = \"\"\r\n        position = 0\r\n\r\n        for para in paragraphs:\r\n            para = para.strip()\r\n            if not para:\r\n                continue\r\n\r\n            if len(current_chunk) + len(para) < self.chunk_size:\r\n                current_chunk += \"\\n\\n\" + para if current_chunk else para\r\n            else:\r\n                if current_chunk:\r\n                    chunk_id = self._generate_chunk_id(document.id, position)\r\n                    chunks.append(DocumentChunk(\r\n                        id=chunk_id,\r\n                        document_id=document.id,\r\n                        content=current_chunk,\r\n                        metadata={\r\n                            \"doc_type\": document.doc_type.value,\r\n                            \"title\": document.title,\r\n                            **document.metadata\r\n                        },\r\n                        token_count=len(current_chunk.split()),\r\n                        position=position\r\n                    ))\r\n                    position += 1\r\n                current_chunk = para\r\n\r\n        # Add remaining content\r\n        if current_chunk:\r\n            chunk_id = self._generate_chunk_id(document.id, position)\r\n            chunks.append(DocumentChunk(\r\n                id=chunk_id,\r\n                document_id=document.id,\r\n                content=current_chunk,\r\n                metadata={\r\n                    \"doc_type\": document.doc_type.value,\r\n                    \"title\": document.title,\r\n                    **document.metadata\r\n                },\r\n                token_count=len(current_chunk.split()),\r\n                position=position\r\n            ))\r\n\r\n        return chunks\r\n\r\n    def _chunk_by_section(self, document: Document) -> List[DocumentChunk]:\r\n        \"\"\"Chunk by document sections (headers)\"\"\"\r\n        # Split by common section patterns\r\n        section_pattern = r'\\n(?=(?:\\d+\\.|\\d+\\s|SECTION|ARTICLE|PART)\\s+[A-Z])'\r\n        sections = re.split(section_pattern, document.content)\r\n\r\n        chunks = []\r\n        for position, section in enumerate(sections):\r\n            section = section.strip()\r\n            if section:\r\n                # If section is too large, further split it\r\n                if len(section) > self.chunk_size * 2:\r\n                    sub_chunker = TextChunker(ChunkingStrategy.PARAGRAPH, self.chunk_size)\r\n                    sub_doc = Document(\r\n                        id=f\"{document.id}_sec{position}\",\r\n                        title=document.title,\r\n                        doc_type=document.doc_type,\r\n                        content=section,\r\n                        source=document.source,\r\n                        metadata=document.metadata\r\n                    )\r\n                    sub_chunks = sub_chunker.chunk_document(sub_doc)\r\n                    for i, chunk in enumerate(sub_chunks):\r\n                        chunk.id = self._generate_chunk_id(document.id, position * 100 + i)\r\n                        chunk.position = position * 100 + i\r\n                    chunks.extend(sub_chunks)\r\n                else:\r\n                    chunk_id = self._generate_chunk_id(document.id, position)\r\n                    chunks.append(DocumentChunk(\r\n                        id=chunk_id,\r\n                        document_id=document.id,\r\n                        content=section,\r\n                        metadata={\r\n                            \"doc_type\": document.doc_type.value,\r\n                            \"title\": document.title,\r\n                            **document.metadata\r\n                        },\r\n                        token_count=len(section.split()),\r\n                        position=position\r\n                    ))\r\n\r\n        return chunks\r\n\r\n    def _chunk_by_sentence(self, document: Document) -> List[DocumentChunk]:\r\n        \"\"\"Chunk by sentences, grouping to meet size requirements\"\"\"\r\n        # Simple sentence splitting\r\n        sentences = re.split(r'(?<=[.!?])\\s+', document.content)\r\n\r\n        chunks = []\r\n        current_chunk = \"\"\r\n        position = 0\r\n\r\n        for sentence in sentences:\r\n            if len(current_chunk) + len(sentence) < self.chunk_size:\r\n                current_chunk += \" \" + sentence if current_chunk else sentence\r\n            else:\r\n                if current_chunk:\r\n                    chunk_id = self._generate_chunk_id(document.id, position)\r\n                    chunks.append(DocumentChunk(\r\n                        id=chunk_id,\r\n                        document_id=document.id,\r\n                        content=current_chunk.strip(),\r\n                        metadata={\r\n                            \"doc_type\": document.doc_type.value,\r\n                            \"title\": document.title,\r\n                            **document.metadata\r\n                        },\r\n                        token_count=len(current_chunk.split()),\r\n                        position=position\r\n                    ))\r\n                    position += 1\r\n                current_chunk = sentence\r\n\r\n        if current_chunk:\r\n            chunk_id = self._generate_chunk_id(document.id, position)\r\n            chunks.append(DocumentChunk(\r\n                id=chunk_id,\r\n                document_id=document.id,\r\n                content=current_chunk.strip(),\r\n                metadata={\r\n                    \"doc_type\": document.doc_type.value,\r\n                    \"title\": document.title,\r\n                    **document.metadata\r\n                },\r\n                token_count=len(current_chunk.split()),\r\n                position=position\r\n            ))\r\n\r\n        return chunks\r\n\r\n    def _generate_chunk_id(self, doc_id: str, position: int) -> str:\r\n        \"\"\"Generate unique chunk ID\"\"\"\r\n        return hashlib.md5(f\"{doc_id}_{position}\".encode()).hexdigest()[:12]\r\n\r\n\r\nclass VectorStore:\r\n    \"\"\"Simple in-memory vector store for RAG\"\"\"\r\n\r\n    def __init__(self):\r\n        self.chunks: Dict[str, DocumentChunk] = {}\r\n        self.embeddings: Dict[str, List[float]] = {}\r\n\r\n    def add_chunks(self, chunks: List[DocumentChunk]):\r\n        \"\"\"Add chunks to the store\"\"\"\r\n        for chunk in chunks:\r\n            self.chunks[chunk.id] = chunk\r\n            if chunk.embedding:\r\n                self.embeddings[chunk.id] = chunk.embedding\r\n\r\n    def search(\r\n        self,\r\n        query_embedding: List[float],\r\n        top_k: int = 5,\r\n        filter_metadata: Optional[Dict] = None\r\n    ) -> List[Tuple[DocumentChunk, float]]:\r\n        \"\"\"Search for similar chunks\"\"\"\r\n        results = []\r\n\r\n        for chunk_id, chunk in self.chunks.items():\r\n            # Apply metadata filter\r\n            if filter_metadata:\r\n                match = all(\r\n                    chunk.metadata.get(k) == v\r\n                    for k, v in filter_metadata.items()\r\n                )\r\n                if not match:\r\n                    continue\r\n\r\n            # Calculate similarity (cosine similarity simulation)\r\n            if chunk_id in self.embeddings:\r\n                score = self._cosine_similarity(query_embedding, self.embeddings[chunk_id])\r\n                results.append((chunk, score))\r\n\r\n        # Sort by score descending\r\n        results.sort(key=lambda x: x[1], reverse=True)\r\n        return results[:top_k]\r\n\r\n    def _cosine_similarity(self, a: List[float], b: List[float]) -> float:\r\n        \"\"\"Calculate cosine similarity between two vectors\"\"\"\r\n        if len(a) != len(b):\r\n            return 0.0\r\n\r\n        dot_product = sum(x * y for x, y in zip(a, b))\r\n        norm_a = sum(x * x for x in a) ** 0.5\r\n        norm_b = sum(x * x for x in b) ** 0.5\r\n\r\n        if norm_a == 0 or norm_b == 0:\r\n            return 0.0\r\n\r\n        return dot_product / (norm_a * norm_b)\r\n\r\n    def get_stats(self) -> Dict:\r\n        \"\"\"Get store statistics\"\"\"\r\n        doc_types = {}\r\n        for chunk in self.chunks.values():\r\n            doc_type = chunk.metadata.get(\"doc_type\", \"unknown\")\r\n            doc_types[doc_type] = doc_types.get(doc_type, 0) + 1\r\n\r\n        return {\r\n            \"total_chunks\": len(self.chunks),\r\n            \"chunks_with_embeddings\": len(self.embeddings),\r\n            \"chunks_by_type\": doc_types\r\n        }\r\n\r\n\r\nclass EmbeddingModel:\r\n    \"\"\"Simulated embedding model (replace with actual model in production)\"\"\"\r\n\r\n    def __init__(self, model_name: str = \"text-embedding-ada-002\"):\r\n        self.model_name = model_name\r\n        self.dimension = 1536\r\n\r\n    def embed(self, text: str) -> List[float]:\r\n        \"\"\"Generate embedding for text\"\"\"\r\n        # Simulation: generate deterministic embedding based on text hash\r\n        text_hash = hashlib.sha256(text.encode()).digest()\r\n        embedding = []\r\n        for i in range(self.dimension):\r\n            byte_idx = i % len(text_hash)\r\n            embedding.append((text_hash[byte_idx] - 128) / 128.0)\r\n        return embedding\r\n\r\n    def embed_batch(self, texts: List[str]) -> List[List[float]]:\r\n        \"\"\"Generate embeddings for multiple texts\"\"\"\r\n        return [self.embed(text) for text in texts]\r\n\r\n\r\nclass ConstructionRAG:\r\n    \"\"\"\r\n    RAG system for construction knowledge bases.\r\n    Based on DDC methodology Chapter 2.3.\r\n    \"\"\"\r\n\r\n    def __init__(\r\n        self,\r\n        embedding_model: Optional[EmbeddingModel] = None,\r\n        chunking_strategy: ChunkingStrategy = ChunkingStrategy.PARAGRAPH,\r\n        chunk_size: int = 500\r\n    ):\r\n        self.embedding_model = embedding_model or EmbeddingModel()\r\n        self.chunker = TextChunker(chunking_strategy, chunk_size)\r\n        self.vector_store = VectorStore()\r\n        self.documents: Dict[str, Document] = {}\r\n\r\n    def add_document(self, document: Document) -> int:\r\n        \"\"\"\r\n        Add a document to the knowledge base.\r\n\r\n        Args:\r\n            document: Document to add\r\n\r\n        Returns:\r\n            Number of chunks created\r\n        \"\"\"\r\n        # Store document\r\n        self.documents[document.id] = document\r\n\r\n        # Chunk document\r\n        chunks = self.chunker.chunk_document(document)\r\n\r\n        # Generate embeddings\r\n        for chunk in chunks:\r\n            chunk.embedding = self.embedding_model.embed(chunk.content)\r\n\r\n        # Add to vector store\r\n        self.vector_store.add_chunks(chunks)\r\n\r\n        # Update document with chunks\r\n        document.chunks = chunks\r\n\r\n        return len(chunks)\r\n\r\n    def add_documents(self, documents: List[Document]) -> Dict[str, int]:\r\n        \"\"\"Add multiple documents\"\"\"\r\n        results = {}\r\n        for doc in documents:\r\n            results[doc.id] = self.add_document(doc)\r\n        return results\r\n\r\n    def search(\r\n        self,\r\n        query: str,\r\n        top_k: int = 5,\r\n        doc_type: Optional[DocumentType] = None\r\n    ) -> List[SearchResult]:\r\n        \"\"\"\r\n        Search the knowledge base.\r\n\r\n        Args:\r\n            query: Search query\r\n            top_k: Number of results to return\r\n            doc_type: Filter by document type\r\n\r\n        Returns:\r\n            List of search results\r\n        \"\"\"\r\n        # Generate query embedding\r\n        query_embedding = self.embedding_model.embed(query)\r\n\r\n        # Build filter\r\n        filter_metadata = None\r\n        if doc_type:\r\n            filter_metadata = {\"doc_type\": doc_type.value}\r\n\r\n        # Search vector store\r\n        results = self.vector_store.search(\r\n            query_embedding,\r\n            top_k=top_k,\r\n            filter_metadata=filter_metadata\r\n        )\r\n\r\n        # Build search results\r\n        search_results = []\r\n        for chunk, score in results:\r\n            doc = self.documents.get(chunk.document_id)\r\n            search_results.append(SearchResult(\r\n                chunk=chunk,\r\n                score=score,\r\n                document_title=doc.title if doc else \"Unknown\",\r\n                doc_type=doc.doc_type if doc else DocumentType.MANUAL\r\n            ))\r\n\r\n        return search_results\r\n\r\n    def query(\r\n        self,\r\n        question: str,\r\n        top_k: int = 5,\r\n        doc_type: Optional[DocumentType] = None\r\n    ) -> RAGResponse:\r\n        \"\"\"\r\n        Answer a question using RAG.\r\n\r\n        Args:\r\n            question: Question to answer\r\n            top_k: Number of context chunks to use\r\n            doc_type: Filter by document type\r\n\r\n        Returns:\r\n            RAG response with answer and sources\r\n        \"\"\"\r\n        # Search for relevant context\r\n        search_results = self.search(question, top_k=top_k, doc_type=doc_type)\r\n\r\n        if not search_results:\r\n            return RAGResponse(\r\n                query=question,\r\n                answer=\"I couldn't find relevant information to answer this question.\",\r\n                sources=[],\r\n                confidence=0.0,\r\n                tokens_used=0\r\n            )\r\n\r\n        # Build context from search results\r\n        context_parts = []\r\n        for i, result in enumerate(search_results):\r\n            context_parts.append(\r\n                f\"[Source {i+1}: {result.document_title}]\\n{result.chunk.content}\"\r\n            )\r\n\r\n        context = \"\\n\\n\".join(context_parts)\r\n\r\n        # Generate answer (simulated - in production, call LLM)\r\n        answer = self._generate_answer(question, context, search_results)\r\n\r\n        # Calculate confidence\r\n        avg_score = sum(r.score for r in search_results) / len(search_results)\r\n\r\n        return RAGResponse(\r\n            query=question,\r\n            answer=answer,\r\n            sources=search_results,\r\n            confidence=avg_score,\r\n            tokens_used=len(context.split()) + len(question.split())\r\n        )\r\n\r\n    def _generate_answer(\r\n        self,\r\n        question: str,\r\n        context: str,\r\n        sources: List[SearchResult]\r\n    ) -> str:\r\n        \"\"\"\r\n        Generate answer from context.\r\n        In production, this would call an LLM API.\r\n        \"\"\"\r\n        # Simulated answer generation\r\n        answer_parts = [\r\n            f\"Based on the available construction documentation:\\n\"\r\n        ]\r\n\r\n        # Extract key information from sources\r\n        for source in sources[:3]:\r\n            # Take first sentence of each relevant chunk\r\n            first_sentence = source.chunk.content.split('.')[0] + '.'\r\n            answer_parts.append(f\"- {first_sentence}\")\r\n\r\n        answer_parts.append(\r\n            f\"\\n\\nThis information comes from {len(sources)} source documents \"\r\n            f\"including: {', '.join(set(s.document_title for s in sources[:3]))}.\"\r\n        )\r\n\r\n        return \"\\n\".join(answer_parts)\r\n\r\n    def get_document_summary(self, document_id: str) -> Optional[Dict]:\r\n        \"\"\"Get summary of a document\"\"\"\r\n        doc = self.documents.get(document_id)\r\n        if not doc:\r\n            return None\r\n\r\n        return {\r\n            \"id\": doc.id,\r\n            \"title\": doc.title,\r\n            \"type\": doc.doc_type.value,\r\n            \"chunks\": len(doc.chunks),\r\n            \"total_tokens\": sum(c.token_count for c in doc.chunks),\r\n            \"source\": doc.source,\r\n            \"created_at\": doc.created_at.isoformat()\r\n        }\r\n\r\n    def get_stats(self) -> Dict:\r\n        \"\"\"Get system statistics\"\"\"\r\n        return {\r\n            \"total_documents\": len(self.documents),\r\n            \"vector_store\": self.vector_store.get_stats(),\r\n            \"embedding_model\": self.embedding_model.model_name,\r\n            \"chunking_strategy\": self.chunker.strategy.value\r\n        }\r\n\r\n    def export_knowledge_base(self) -> Dict:\r\n        \"\"\"Export knowledge base for backup/transfer\"\"\"\r\n        return {\r\n            \"documents\": [\r\n                {\r\n                    \"id\": doc.id,\r\n                    \"title\": doc.title,\r\n                    \"type\": doc.doc_type.value,\r\n                    \"content\": doc.content,\r\n                    \"source\": doc.source,\r\n                    \"metadata\": doc.metadata\r\n                }\r\n                for doc in self.documents.values()\r\n            ],\r\n            \"stats\": self.get_stats(),\r\n            \"exported_at\": datetime.now().isoformat()\r\n        }\r\n```\r\n\r\n## Common Use Cases\r\n\r\n### Build Construction Knowledge Base\r\n\r\n```python\r\nrag = ConstructionRAG(\r\n    chunking_strategy=ChunkingStrategy.SECTION,\r\n    chunk_size=500\r\n)\r\n\r\n# Add specifications\r\nspec_doc = Document(\r\n    id=\"spec-03300\",\r\n    title=\"Cast-in-Place Concrete Specification\",\r\n    doc_type=DocumentType.SPECIFICATION,\r\n    content=\"\"\"\r\n    SECTION 03 30 00 - CAST-IN-PLACE CONCRETE\r\n\r\n    PART 1 - GENERAL\r\n    1.1 SUMMARY\r\n    A. Section includes cast-in-place concrete for foundations,\r\n       slabs, walls, and other structural elements.\r\n\r\n    1.2 RELATED SECTIONS\r\n    A. Section 03 10 00 - Concrete Forming\r\n    B. Section 03 20 00 - Concrete Reinforcing\r\n\r\n    PART 2 - PRODUCTS\r\n    2.1 CONCRETE MATERIALS\r\n    A. Portland Cement: ASTM C150, Type I or II\r\n    B. Aggregates: ASTM C33, graded\r\n    C. Water: Clean, potable\r\n    \"\"\",\r\n    source=\"project_specs.pdf\",\r\n    metadata={\"division\": \"03\", \"project\": \"Building A\"}\r\n)\r\n\r\nchunks_created = rag.add_document(spec_doc)\r\nprint(f\"Created {chunks_created} chunks\")\r\n```\r\n\r\n### Search Knowledge Base\r\n\r\n```python\r\n# Search for concrete requirements\r\nresults = rag.search(\r\n    query=\"concrete strength requirements\",\r\n    top_k=5,\r\n    doc_type=DocumentType.SPECIFICATION\r\n)\r\n\r\nfor result in results:\r\n    print(f\"Score: {result.score:.3f}\")\r\n    print(f\"Document: {result.document_title}\")\r\n    print(f\"Content: {result.chunk.content[:200]}...\")\r\n    print()\r\n```\r\n\r\n### Answer Questions with RAG\r\n\r\n```python\r\nresponse = rag.query(\r\n    question=\"What type of cement should be used for foundations?\",\r\n    top_k=3\r\n)\r\n\r\nprint(f\"Answer: {response.answer}\")\r\nprint(f\"Confidence: {response.confidence:.0%}\")\r\nprint(f\"Sources: {len(response.sources)}\")\r\n```\r\n\r\n## Quick Reference\r\n\r\n| Component | Purpose |\r\n|-----------|---------|\r\n| `ConstructionRAG` | Main RAG system |\r\n| `TextChunker` | Document chunking |\r\n| `VectorStore` | Embedding storage |\r\n| `EmbeddingModel` | Text embeddings |\r\n| `DocumentChunk` | Chunk with metadata |\r\n| `RAGResponse` | Query response |\r\n\r\n## Resources\r\n\r\n- **Book**: \"Data-Driven Construction\" by Artem Boiko, Chapter 2.3\r\n- **Website**: https://datadrivenconstruction.io\r\n\r\n## Next Steps\r\n\r\n- Use [llm-data-automation](../llm-data-automation/SKILL.md) for automation\r\n- Use [vector-search](../../Chapter-4.4/vector-search/SKILL.md) for advanced search\r\n- Use [document-classification-nlp](../../../DDC_Innovative/document-classification-nlp/SKILL.md) for classification\r\n","topics":["Rag","Document"],"tags":{"latest":"2.1.0"},"stats":{"comments":0,"downloads":2098,"installsAllTime":79,"installsCurrent":7,"stars":6,"versions":2},"createdAt":1770475453301,"updatedAt":1778486073009},"latestVersion":{"version":"2.1.0","createdAt":1771173251778,"changelog":"rag-construction 2.1.0\n\n- Added SKILL.md describing the feature set, quick start guide, and technical reference.\n- Provides tools for building Retrieval-Augmented Generation (RAG) systems for construction knowledge bases.\n- Includes classes for document management, chunking strategies, and semantic search support.\n- Enables creation of AI-powered, searchable document systems for construction projects.\n- Metadata now describes supported platforms and requirements.","license":null},"metadata":{"setup":[],"os":["darwin","linux","win32"],"systems":null},"owner":{"handle":"datadrivenconstruction","userId":"s1774mv3t1cm8r1kgs9hccdnmn8852nb","displayName":"datadrivenconstruction","image":"https://avatars.githubusercontent.com/u/94158709?v=4"},"moderation":null}