Retrieval-Augmented Generation (RAG)

Name: Retrieval-Augmented Generation (RAG)
Author: Giuseppe Trisciuoglio

RAGAILLMVector DatabaseSemantic SearchKnowledge BaseHallucination ReductionAI Agent

⭐ 282📄 MIT🕒 2026-06-15Source ↗

Install this skill

npx skills add giuseppe-trisciuoglio/developer-kit

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

RAG Implementation allows developers to ground language models in private data sets, preventing factual errors by forcing the model to cite retrieved document segments. This approach moves beyond training or fine-tuning by providing the LLM with real-time access to specific text files, PDFs, or databases as context for its answers. The workflow centers on a processing pipeline that chunks raw text, converts those segments into vector embeddings, and stores them in a database. When a user asks a question, the system searches the store for semantically related documents, injects that text into the AI prompt, and returns a verified answer. This architecture is standard for building internal corporate search engines or technical support interfaces where precision and source transparency are mandatory.

When to Use This Skill

•Building technical documentation assistants that cite manual pages
•Creating internal company Q&A bots for HR or policy databases
•Developing research tools that automatically reference local source files
•Constructing legal document analysis systems for contract discovery

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

“add my local documents to the AI's knowledge base
“connect the chatbot to my technical PDF library
“set up a vector store for semantic document search
“how do I filter my AI responses by document category
“ground the AI answers using my local data sources

Pro Tips

💡Optimize your chunking strategy: Experiment with different text chunk sizes and overlap to balance retrieval relevance and context window limitations for your specific knowledge base.
💡Prioritize embedding model selection: The quality of your embeddings directly impacts retrieval performance. Choose an embedding model that is well-suited to the domain and nature of your data.
💡Implement hybrid search: Combine semantic search with keyword-based search (e.g., BM25) to leverage the strengths of both, improving recall and precision, especially for specific entity lookups.

What this skill does

•Translates unstructured documents into searchable vector embeddings
•Performs hybrid searches combining keyword and semantic matching
•Filters query results based on custom document metadata
•Integrates multiple retrieval sources into a single AI response stream
•Manages document chunking strategies to optimize context window usage

When not to use it

✕When the data is highly dynamic and requires near-instantaneous global state updates
✕When the task involves simple, non-semantic keyword lookups in small flat files
✕When the AI's existing training data already covers the entire necessary domain

Example workflow

Split large source files into manageable overlapping text chunks
Generate numerical embeddings for each chunk via an embedding model
Save the generated vectors and text into a configured database
Configure an AI service to act as the query interface
Fetch relevant context snippets dynamically based on incoming questions

Prerequisites

–Valid API key for an embedding provider
–Access to a vector database instance
–A structured collection of source documents

Pitfalls & limitations

!Poor retrieval quality due to suboptimal chunk size or overlap settings
!Semantic drift caused by low-quality embedding models or mismatched domains
!High latency during the initial document ingestion and embedding phase
!Over-reliance on context which may exceed the target model's token limits

FAQ

Why do I need a vector database instead of just pasting files into the prompt?

Vector databases enable efficient semantic searching of massive datasets that would otherwise exceed the LLM's context window limit.

Can I use multiple databases for one chatbot?

Yes, you can define multiple content retrievers and merge their results before passing them to the language model.

What is the benefit of metadata filtering?

Metadata filtering allows you to constrain the search space, such as only looking for files labeled as 'technical' or belonging to a specific date range.

How it compares

Compared to generic prompt engineering, RAG provides deterministic grounding and verifiable citations, whereas standard prompts rely solely on the model's static, unverified internal knowledge.

Source & trust

⭐ 282 stars📄 MIT🕒 Updated 2026-06-15

View original skill on GitHub →

📄 Full skill instructions — original source: giuseppe-trisciuoglio/developer-kit

# RAG Implementation

Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.

## Overview

RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers.

## When to Use

Use this skill when:

- Building Q&A systems over proprietary documents
- Creating chatbots with current, factual information
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded responses
- Enabling AI systems to access domain-specific knowledge
- Building documentation assistants
- Creating research tools with source citation
- Developing knowledge management systems

## Core Components

### Vector Databases
Store and efficiently retrieve document embeddings for semantic search.

**Key Options:**
- **Pinecone**: Managed, scalable, production-ready
- **Weaviate**: Open-source, hybrid search capabilities
- **Milvus**: High performance, on-premise deployment
- **Chroma**: Lightweight, easy local development
- **Qdrant**: Fast, advanced filtering
- **FAISS**: Meta's library, full control

### Embedding Models
Convert text to numerical vectors for similarity search.

**Popular Models:**
- **text-embedding-ada-002** (OpenAI): General purpose, 1536 dimensions
- **all-MiniLM-L6-v2**: Fast, lightweight, 384 dimensions
- **e5-large-v2**: High quality, multilingual
- **bge-large-en-v1.5**: State-of-the-art performance

### Retrieval Strategies
Find relevant content based on user queries.

**Approaches:**
- **Dense Retrieval**: Semantic similarity via embeddings
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
- **Hybrid Search**: Combine dense + sparse for best results
- **Multi-Query**: Generate multiple query variations
- **Contextual Compression**: Extract only relevant parts

## Quick Implementation

### Basic RAG Setup

// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");

// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);

// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
    .build();

### Document Processing Pipeline

// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
    500,  // chunk size
    100   // overlap
);

// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey("your-api-key")
    .build();

// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
    .host("localhost")
    .database("postgres")
    .user("postgres")
    .password("password")
    .table("embeddings")
    .dimension(1536)
    .build();

// Process and store documents
for (Document document : documents) {
    List<TextSegment> segments = splitter.split(document);
    for (TextSegment segment : segments) {
        Embedding embedding = embeddingModel.embed(segment).content();
        embeddingStore.add(embedding, segment);
    }
}

## Implementation Patterns

### Pattern 1: Simple Document Q&A

Create a basic Q&A system over your documents.

public interface DocumentAssistant {
    String answer(String question);
}

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

### Pattern 2: Metadata-Filtered Retrieval

Filter results based on document metadata.

// Add metadata during document loading
Document document = Document.builder()
    .text("Content here")
    .metadata("source", "technical-manual.pdf")
    .metadata("category", "technical")
    .metadata("date", "2024-01-15")
    .build();

// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(embeddingStore)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

### Pattern 3: Multi-Source Retrieval

Combine results from multiple knowledge sources.

ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);

// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));

// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);

## Best Practices

### Document Preparation
- Clean and preprocess documents before ingestion
- Remove irrelevant content and formatting artifacts
- Standardize document structure for consistent processing
- Add relevant metadata for filtering and context

### Chunking Strategy
- Use 500-1000 tokens per chunk for optimal balance
- Include 10-20% overlap to preserve context at boundaries
- Consider document structure when determining chunk boundaries
- Test different chunk sizes for your specific use case

### Retrieval Optimization
- Start with high k values (10-20) then filter/rerank
- Use metadata filtering to improve relevance
- Combine multiple retrieval strategies for better coverage
- Monitor retrieval quality and user feedback

### Performance Considerations
- Cache embeddings for frequently accessed content
- Use batch processing for document ingestion
- Optimize vector store configuration for your scale
- Monitor query performance and system resources

## Common Issues and Solutions

### Poor Retrieval Quality
**Problem**: Retrieved documents don't match user queries
**Solutions**:
- Improve document preprocessing and cleaning
- Adjust chunk size and overlap parameters
- Try different embedding models
- Use hybrid search combining semantic and keyword matching

### Irrelevant Results
**Problem**: Retrieved documents contain relevant information but are not specific enough
**Solutions**:
- Add metadata filtering for domain-specific constraints
- Implement reranking with cross-encoder models
- Use contextual compression to extract relevant parts
- Fine-tune retrieval parameters (k values, similarity thresholds)

### Performance Issues
**Problem**: Slow response times during retrieval
**Solutions**:
- Optimize vector store configuration and indexing
- Implement caching for frequently retrieved content
- Use smaller embedding models for faster inference
- Consider approximate nearest neighbor algorithms

### Hallucination Prevention
**Problem**: AI generates information not present in retrieved documents
**Solutions**:
- Improve prompt engineering to emphasize grounding
- Add verification steps to check answer alignment
- Include confidence scoring for responses
- Implement fact-checking mechanisms

## Evaluation Framework

### Retrieval Metrics
- **Precision@k**: Percentage of relevant documents in top-k results
- **Recall@k**: Percentage of all relevant documents found in top-k results
- **Mean Reciprocal Rank (MRR)**: Average rank of first relevant result
- **Normalized Discounted Cumulative Gain (nDCG)**: Ranking quality metric

### Answer Quality Metrics
- **Faithfulness**: Degree to which answers are grounded in retrieved documents
- **Answer Relevance**: How well answers address user questions
- **Context Recall**: Percentage of relevant context used in answers
- **Context Precision**: Percentage of retrieved context that is relevant

### User Experience Metrics
- **Response Time**: Time from query to answer
- **User Satisfaction**: Feedback ratings on answer quality
- **Task Completion**: Rate of successful task completion
- **Engagement**: User interaction patterns with the system

## Resources

### Reference Documentation
- [Vector Database Comparison](references/vector-databases.md) - Detailed comparison of vector database options
- [Embedding Models Guide](references/embedding-models.md) - Model selection and optimization
- [Retrieval Strategies](references/retrieval-strategies.md) - Advanced retrieval techniques
- [Document Chunking](references/document-chunking.md) - Chunking strategies and best practices
- [LangChain4j RAG Guide](references/langchain4j-rag-guide.md) - Official implementation patterns

### Assets
- assets/vector-store-config.yaml - Configuration templates for different vector stores
- assets/retriever-pipeline.java - Complete RAG pipeline implementation
- assets/evaluation-metrics.java - Evaluation framework code

## Constraints and Limitations

1. **Token Limits**: Respect model context window limitations
2. **API Rate Limits**: Manage external API rate limits and costs
3. **Data Privacy**: Ensure compliance with data protection regulations
4. **Resource Requirements**: Consider memory and computational requirements
5. **Maintenance**: Plan for regular updates and system monitoring

## Security Considerations

- Secure access to vector databases and embedding services
- Implement proper authentication and authorization
- Validate and sanitize user inputs
- Monitor for abuse and unusual usage patterns
- Regular security audits and penetration testing

By Giuseppe Trisciuoglio

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Click "Download" above
In your project, create the directory: .agent/skills/rag/
Save the file as SKILL.md
The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

Claude Code: ~/.claude/skills/giuseppe-trisciuoglio/developer-kit/rag/SKILL.md
Cursor: ~/.cursor/skills/giuseppe-trisciuoglio/developer-kit/rag/SKILL.md
Antigravity: ~/.gemini/antigravity/skills/giuseppe-trisciuoglio/developer-kit/rag/SKILL.md

🚀 Install with CLI:
npx skills add giuseppe-trisciuoglio/developer-kit

Read the Master Guide: Mastering Agent Skills →

Recommended Rules

View more rules →

Recommended Workflows

View more workflows →

Automatic commit message generator

GitAIAutomation

--- description: Automatic commit message generator and fast AI-powered commit for all current changes --- // turbo-all This workflow automatically ...

Fix Next.js Hydration Errors

Next.jsDebuggingHydration

--- description: Systematically debug and fix 'Text content does not match server-rendered HTML' errors --- 1. **Check for Invalid HTML Nesting**: ...

Nuke & Reinstall

npmTroubleshootingDependencies

--- description: The nuclear option for when dependencies are completely broken --- 1. **Remove node_modules**: - Delete the existing `node_module...

Recommended MCP Servers

View more MCP servers →

py-mcp-qdrant-rag

Community

(by amornpan) - A Model Context Protocol server implementation that provides RAG capabilities through Qdrant vector database integration, enabling AI agents to perform semantic search and document retrieval with local or cloud-based embedding generation support across Mac, Linux, and Windows platforms.