Back to AI Tools & Agents

langchain4j-rag-implementation-patterns

RAGLangChain4jAI AgentsVector DatabasesContextual AIKnowledge BasesSpring BootSemantic Search
⭐ 282πŸ“„ MITπŸ•’ 2026-06-15Source β†—

Install this skill

npx skills add giuseppe-trisciuoglio/developer-kit

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

LangChain4j RAG Implementation Patterns provides a structured framework for connecting Large Language Models to private datasets within Java-based Spring Boot applications. Instead of relying on static training data, this approach enables developers to perform semantic lookups against custom knowledge bases. The implementation manages the entire lifecycle of document processing, including file ingestion, text segmentation, vector embedding generation, and retrieval orchestration. By incorporating metadata-aware filtering and configurable relevance thresholds, it allows applications to serve grounded responses while maintaining source transparency. This toolkit simplifies the integration between the LangChain4j ecosystem and vector-based storage, allowing for efficient, context-driven AI interaction patterns without manual orchestration of low-level embedding APIs or retrieval logic.

When to Use This Skill

  • β€’Building customer support chatbots that reference company documentation
  • β€’Developing internal search tools for enterprise knowledge repositories
  • β€’Creating document summarization services with verifiable citations
  • β€’Implementing hybrid search pipelines combining keyword and vector methods

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œhow to set up RAG in Spring Boot using LangChain4j
  • β€œimplement document ingestion with LangChain4j and OpenAI
  • β€œcreate a conversational knowledge assistant in Java
  • β€œcode example for semantic search with LangChain4j embedding store
  • β€œconfigure a content retriever for AI chat services

Pro Tips

  • πŸ’‘Optimize chunking strategies based on document structure and query patterns to maximize retrieval relevance for RAG.
  • πŸ’‘Combine vector search with keyword search (hybrid search) for improved recall, especially with specific entity lookups.
  • πŸ’‘Regularly evaluate your RAG system's performance using metrics like faithfulness and groundedness to fine-tune retrieval and generation components.

What this skill does

  • β€’Automated splitting and vectorization of text documents
  • β€’Integration with vector stores for semantic similarity searching
  • β€’Configurable retrieval parameters like minScore and result limits
  • β€’Declarative AI service creation with system message injection
  • β€’Context-aware response generation using LangChain4j primitives

When not to use it

  • βœ•Projects requiring heavy multi-modal video or audio processing
  • βœ•Low-latency environments where vector retrieval overhead is unacceptable
  • βœ•Scenarios needing fine-tuned model weights rather than context augmentation

Example workflow

  1. Define the embedding model and persistent embedding store beans
  2. Load target documents using the file system document loader
  3. Split content into manageable segments using recursive splitters
  4. Generate and store embeddings within the configured database
  5. Initialize an AiService instance attached to a content retriever
  6. Execute user queries to fetch context and generate responses

Prerequisites

  • –Valid OpenAI API key
  • –Java 17 or higher
  • –Spring Boot project structure

Pitfalls & limitations

  • !In-memory stores do not persist data across application restarts
  • !Choosing incorrect chunk sizes can degrade retrieval relevance
  • !Missing minScore filters often lead to irrelevant context being injected
  • !Heavy reliance on API-based embeddings increases per-query latency

FAQ

Can I use this with non-OpenAI models?
Yes, LangChain4j supports various embedding model providers; you simply need to swap the configuration bean for your preferred provider.
Does this support persistent document storage?
The provided patterns use an InMemoryEmbeddingStore, but you can replace this with database-backed stores like Pinecone or Milvus for persistence.
What is the purpose of the DocumentSplitter?
It breaks large documents into smaller, contextually relevant chunks that fit within the embedding model's input token limits.

How it compares

This approach automates the plumbing of vector similarity pipelines, which would otherwise require manually handling REST calls, embeddings encoding, and context injection logic.

Source & trust

⭐ 282 starsπŸ“„ MITπŸ•’ Updated 2026-06-15
πŸ“„ Full skill instructions β€” original source: giuseppe-trisciuoglio/developer-kit
# LangChain4j RAG Implementation Patterns

## When to Use This Skill

Use this skill when:
- Building knowledge-based AI applications requiring external document access
- Implementing question-answering systems over large document collections
- Creating AI assistants with access to company knowledge bases
- Building semantic search capabilities for document repositories
- Implementing chat systems that reference specific information sources
- Creating AI applications requiring source attribution
- Building domain-specific AI systems with curated knowledge
- Implementing hybrid search combining vector similarity with traditional search
- Creating AI applications requiring real-time document updates
- Building multi-modal RAG systems with text, images, and other content types

## Overview

Implement complete Retrieval-Augmented Generation (RAG) systems with LangChain4j. RAG enhances language models by providing relevant context from external knowledge sources, improving accuracy and reducing hallucinations.

## Instructions

### Initialize RAG Project

Create a new Spring Boot project with required dependencies:

**pom.xml**:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-spring-boot-starter</artifactId>
<version>1.8.0</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-open-ai</artifactId>
<version>1.8.0</version>
</dependency>


### Setup Document Ingestion

Configure document loading and processing:

@Configuration
public class RAGConfiguration {

@Bean
public EmbeddingModel embeddingModel() {
return OpenAiEmbeddingModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("text-embedding-3-small")
.build();
}

@Bean
public EmbeddingStore<TextSegment> embeddingStore() {
return new InMemoryEmbeddingStore<>();
}
}


Create document ingestion service:

@Service
@RequiredArgsConstructor
public class DocumentIngestionService {

private final EmbeddingModel embeddingModel;
private final EmbeddingStore<TextSegment> embeddingStore;

public void ingestDocument(String filePath, Map<String, Object> metadata) {
Document document = FileSystemDocumentLoader.loadDocument(filePath);
document.metadata().putAll(metadata);

DocumentSplitter splitter = DocumentSplitters.recursive(
500, 50, new OpenAiTokenCountEstimator("text-embedding-3-small")
);

List<TextSegment> segments = splitter.split(document);
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
embeddingStore.addAll(embeddings, segments);
}
}


### Configure Content Retrieval

Setup content retrieval with filtering:

@Configuration
public class ContentRetrieverConfiguration {

@Bean
public ContentRetriever contentRetriever(
EmbeddingStore<TextSegment> embeddingStore,
EmbeddingModel embeddingModel) {

return EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.build();
}
}


### Create RAG-Enabled AI Service

Define AI service with context retrieval:

interface KnowledgeAssistant {
@SystemMessage("""
You are a knowledgeable assistant with access to a comprehensive knowledge base.

When answering questions:
1. Use the provided context from the knowledge base
2. If information is not in the context, clearly state this
3. Provide accurate, helpful responses
4. When possible, reference specific sources
5. If the context is insufficient, ask for clarification
""")
String answerQuestion(String question);
}

@Service
@RequiredArgsConstructor
public class KnowledgeService {

private final KnowledgeAssistant assistant;

public KnowledgeService(ChatModel chatModel, ContentRetriever contentRetriever) {
this.assistant = AiServices.builder(KnowledgeAssistant.class)
.chatModel(chatModel)
.contentRetriever(contentRetriever)
.build();
}

public String answerQuestion(String question) {
return assistant.answerQuestion(question);
}
}


## Examples

### Basic Document Processing

public class BasicRAGExample {
public static void main(String[] args) {
var embeddingStore = new InMemoryEmbeddingStore<TextSegment>();

var embeddingModel = OpenAiEmbeddingModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("text-embedding-3-small")
.build();

var ingestor = EmbeddingStoreIngestor.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();

ingestor.ingest(Document.from("Spring Boot is a framework for building Java applications with minimal configuration."));

var retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.build();
}
}


### Multi-Domain Assistant

interface MultiDomainAssistant {
@SystemMessage("""
You are an expert assistant with access to multiple knowledge domains:
- Technical documentation
- Company policies
- Product information
- Customer support guides

Tailor your response based on the type of question and available context.
Always indicate which domain the information comes from.
""")
String answerQuestion(@MemoryId String userId, String question);
}


### Hierarchical RAG

@Service
@RequiredArgsConstructor
public class HierarchicalRAGService {

private final EmbeddingStore<TextSegment> chunkStore;
private final EmbeddingStore<TextSegment> summaryStore;
private final EmbeddingModel embeddingModel;

public String performHierarchicalRetrieval(String query) {
List<EmbeddingMatch<TextSegment>> summaryMatches = searchSummaries(query);
List<TextSegment> relevantChunks = new ArrayList<>();

for (EmbeddingMatch<TextSegment> summaryMatch : summaryMatches) {
String documentId = summaryMatch.embedded().metadata().getString("documentId");
List<EmbeddingMatch<TextSegment>> chunkMatches = searchChunksInDocument(query, documentId);
chunkMatches.stream()
.map(EmbeddingMatch::embedded)
.forEach(relevantChunks::add);
}

return generateResponseWithChunks(query, relevantChunks);
}
}


## Best Practices

### Document Segmentation

- Use recursive splitting with 500-1000 token chunks for most applications
- Maintain 20-50 token overlap between chunks for context preservation
- Consider document structure (headings, paragraphs) when splitting
- Use token-aware splitters for optimal embedding generation

### Metadata Strategy

- Include rich metadata for filtering and attribution:
- User and tenant identifiers for multi-tenancy
- Document type and category classification
- Creation and modification timestamps
- Version and author information
- Confidentiality and access level tags

### Query Processing

- Implement query preprocessing and cleaning
- Consider query expansion for better recall
- Apply dynamic filtering based on user context
- Use re-ranking for improved result quality

### Performance Optimization

- Cache embeddings for repeated queries
- Use batch embedding generation for bulk operations
- Implement pagination for large result sets
- Consider asynchronous processing for long operations

## Common Patterns

### Simple RAG Pipeline

@RequiredArgsConstructor
@Service
public class SimpleRAGPipeline {

private final EmbeddingModel embeddingModel;
private final EmbeddingStore<TextSegment> embeddingStore;
private final ChatModel chatModel;

public String answerQuestion(String question) {
Embedding queryEmbedding = embeddingModel.embed(question).content();
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(3)
.build();

List<TextSegment> segments = embeddingStore.search(request).matches().stream()
.map(EmbeddingMatch::embedded)
.collect(Collectors.toList());

String context = segments.stream()
.map(TextSegment::text)
.collect(Collectors.joining("\n\n"));

return chatModel.generate(context + "\n\nQuestion: " + question + "\nAnswer:");
}
}


### Hybrid Search (Vector + Keyword)

@Service
@RequiredArgsConstructor
public class HybridSearchService {

private final EmbeddingStore<TextSegment> vectorStore;
private final FullTextSearchEngine keywordEngine;
private final EmbeddingModel embeddingModel;

public List<Content> hybridSearch(String query, int maxResults) {
// Vector search
List<Content> vectorResults = performVectorSearch(query, maxResults);

// Keyword search
List<Content> keywordResults = performKeywordSearch(query, maxResults);

// Combine and re-rank using RRF algorithm
return combineResults(vectorResults, keywordResults, maxResults);
}
}


## Troubleshooting

### Common Issues

**Poor Retrieval Results**
- Check document chunk size and overlap settings
- Verify embedding model compatibility
- Ensure metadata filters are not too restrictive
- Consider adding re-ranking step

**Slow Performance**
- Use cached embeddings for frequent queries
- Optimize database indexing for vector stores
- Implement pagination for large datasets
- Consider async processing for bulk operations

**High Memory Usage**
- Use disk-based embedding stores for large datasets
- Implement proper pagination and filtering
- Clean up unused embeddings periodically
- Monitor and optimize chunk sizes

## References

- [API Reference](references/references.md) - Complete API documentation and interfaces
- [Examples](references/examples.md) - Production-ready examples and patterns
- [Official LangChain4j Documentation](https://docs.langchain4j.dev/)

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/langchain4j-rag-implementation-patterns/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-rag-implementation-patterns/SKILL.md
  • Cursor: ~/.cursor/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-rag-implementation-patterns/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-rag-implementation-patterns/SKILL.md

πŸš€ Install with CLI:
npx skills add giuseppe-trisciuoglio/developer-kit

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid ai tools & agents issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under AI Tools & Agents and is published by Giuseppe Trisciuoglio, maintained in giuseppe-trisciuoglio/developer-kit.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.