Back to Testing & Quality Assurance

langchain4j-testing-strategies

LangChain4JAI testingLLM testingunit testingintegration testingRAGTestcontainersJava AI
⭐ 282πŸ“„ MITπŸ•’ 2026-06-15Source β†—

Install this skill

npx skills add giuseppe-trisciuoglio/developer-kit

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

LangChain4J testing strategies provide a structured framework for validating AI-integrated Java applications. This approach balances rapid feedback through unit tests with high-fidelity validation using Testcontainers for real-world interactions. Developers can isolate business logic by mocking chat and embedding models, ensuring that service configurations and AI pipelines function reliably without incurring API costs or latency during builds. The strategy emphasizes a testing pyramid, prioritizing fast execution while maintaining oversight of complex flows like RAG systems, tool invocation, and streaming response patterns. By configuring specific test dependencies and maintaining strict isolation between test cases, teams ensure that prompt changes, dependency updates, and internal logic refinements do not break existing LLM-powered workflows. This methodology specifically addresses the unique non-deterministic nature of AI outputs within standard Java enterprise testing environments.

When to Use This Skill

  • β€’Verifying that an AiService correctly maps a natural language query to a specific Java tool function
  • β€’Ensuring RAG retrieval logic correctly filters and embeds context from local document stores
  • β€’Validating exception handling when an LLM provider returns malformed JSON or times out
  • β€’Regression testing prompt changes to ensure output structure remains consistent for downstream processing

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œhow to mock chat models in LangChain4J
  • β€œsetup Testcontainers for Ollama in Java
  • β€œwrite unit tests for LangChain4J AiServices
  • β€œbest practices for testing RAG pipelines in Java
  • β€œhow to verify tool execution in LangChain4J tests

Pro Tips

  • πŸ’‘Combine mock models for rapid unit testing of business logic with Testcontainers for realistic, isolated integration testing of your full AI stack.
  • πŸ’‘Prioritize testing guardrails and critical tool execution paths, as these are often where subtle failures can lead to significant issues.
  • πŸ’‘Implement snapshot testing for LLM outputs in key flows to detect unexpected changes or regressions after model updates or prompt modifications.

What this skill does

  • β€’Isolation of LLM logic via Mockito-based model stubs
  • β€’Containerized integration testing for local model execution using Ollama or similar engines
  • β€’Validation of Retrieval-Augmented Generation (RAG) pipelines and retrieval accuracy
  • β€’Assertion testing for tool-calling and function invocation logic
  • β€’Performance measurement for streaming responses and latency thresholds

When not to use it

  • βœ•Testing raw model accuracy or training data quality of a commercial foundation model
  • βœ•Performance benchmarking of third-party cloud-based LLM APIs that do not support test keys
  • βœ•Evaluating non-Java AI workflows that do not integrate with the LangChain4J library

Example workflow

  1. Configure project dependencies to include langchain4j-test and Testcontainers
  2. Implement unit tests using Mockito to verify basic AiService routing and logic
  3. Create containerized integration tests using Testcontainers for local model interactions
  4. Define test data scenarios including edge cases for empty inputs or malformed prompts
  5. Execute the test suite during CI/CD to monitor for regressions in AI pipeline behavior

Prerequisites

  • –LangChain4J core library installation
  • –Docker installed for Testcontainers usage
  • –JUnit 5 testing framework
  • –Knowledge of Mockito library

Pitfalls & limitations

  • !Over-relying on real LLM calls in integration tests leads to slow builds and high API expenses
  • !Failing to reset state in global stores between tests results in non-deterministic flakiness
  • !Underestimating the latency of spinning up containers, which can inflate feedback loops if not managed properly

FAQ

Should I use real API keys in my integration tests?
No. Use Testcontainers for local models or mock services to keep tests isolated, deterministic, and free of security risks.
How do I test streaming responses?
You should use specialized test handlers provided by the library to verify that chunks are emitted and aggregated correctly by your service.
Why do my tests fail when I change the prompt?
LLM outputs are sensitive to prompts; verify your assertions target the functional outcome or specific data format rather than exact string matches.
How can I speed up tests that use containers?
Use container reuse features in Testcontainers to prevent restarting the service for every individual test method.

How it compares

Unlike manual ad-hoc testing, this framework provides a formal, repeatable path that treats AI behavior as testable code, reducing the ambiguity typically associated with generative outputs.

Source & trust

⭐ 282 starsπŸ“„ MITπŸ•’ Updated 2026-06-15
πŸ“„ Full skill instructions β€” original source: giuseppe-trisciuoglio/developer-kit
# LangChain4J Testing Strategies

## When to Use This Skill

Use this skill when:
- Building AI-powered applications with LangChain4J
- Writing unit tests for AI services and guardrails
- Setting up integration tests with real LLM models
- Creating mock-based tests for faster test execution
- Using Testcontainers for isolated testing environments
- Testing RAG (Retrieval-Augmented Generation) systems
- Validating tool execution and function calling
- Testing streaming responses and async operations
- Setting up end-to-end tests for AI workflows
- Implementing performance and load testing

## Instructions

To test LangChain4J applications effectively, follow these key strategies:

### 1. Start with Unit Testing

Use mock models for fast, isolated testing of business logic. See references/unit-testing.md for detailed examples.

// Example: Mock ChatModel for unit tests
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(any(String.class)))
.thenReturn(Response.from(AiMessage.from("Mocked response")));

var service = AiServices.builder(AiService.class)
.chatModel(mockModel)
.build();


### 2. Configure Testing Dependencies

Setup proper Maven/Gradle dependencies for testing. See references/testing-dependencies.md for complete configuration.

**Key dependencies**:
- langchain4j-test - Testing utilities and guardrail assertions
- testcontainers - Integration testing with containerized services
- mockito - Mock external dependencies
- assertj - Better assertions

### 3. Implement Integration Tests

Test with real services using Testcontainers. See references/integration-testing.md for container setup examples.

@Testcontainers
class OllamaIntegrationTest {
@Container
static GenericContainer<?> ollama = new GenericContainer<>(
DockerImageName.parse("ollama/ollama:latest")
).withExposedPorts(11434);

@Test
void shouldGenerateResponse() {
ChatModel model = OllamaChatModel.builder()
.baseUrl(ollama.getEndpoint())
.build();
String response = model.generate("Test query");
assertNotNull(response);
}
}


### 4. Test Advanced Features

For streaming responses, memory management, and complex workflows, refer to references/advanced-testing.md.

### 5. Apply Testing Workflows

Follow testing pyramid patterns and best practices from references/workflow-patterns.md.

- **70% Unit Tests**: Fast, isolated business logic testing
- **20% Integration Tests**: Real service interactions
- **10% End-to-End Tests**: Complete user workflows

## Examples

### Basic Unit Test

@Test
void shouldProcessQueryWithMock() {
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(any(String.class)))
.thenReturn(Response.from(AiMessage.from("Test response")));

var service = AiServices.builder(AiService.class)
.chatModel(mockModel)
.build();

String result = service.chat("What is Java?");
assertEquals("Test response", result);
}


### Integration Test with Testcontainers

@Testcontainers
class RAGIntegrationTest {
@Container
static GenericContainer<?> ollama = new GenericContainer<>(
DockerImageName.parse("ollama/ollama:latest")
);

@Test
void shouldCompleteRAGWorkflow() {
// Setup models and stores
var chatModel = OllamaChatModel.builder()
.baseUrl(ollama.getEndpoint())
.build();

var embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl(ollama.getEndpoint())
.build();

var store = new InMemoryEmbeddingStore<>();
var retriever = EmbeddingStoreContentRetriever.builder()
.chatModel(chatModel)
.embeddingStore(store)
.embeddingModel(embeddingModel)
.build();

// Test complete workflow
var assistant = AiServices.builder(RagAssistant.class)
.chatLanguageModel(chatModel)
.contentRetriever(retriever)
.build();

String response = assistant.chat("What is Spring Boot?");
assertNotNull(response);
assertTrue(response.contains("Spring"));
}
}


## Best Practices

### Test Isolation
- Each test must be independent
- Use @BeforeEach and @AfterEach for setup/teardown
- Avoid sharing state between tests

### Mock External Dependencies
- Never call real APIs in unit tests
- Use mocks for ChatModel, EmbeddingModel, and external services
- Test error handling scenarios

### Performance Considerations
- Unit tests should run in < 50ms
- Integration tests should use container reuse
- Include timeout assertions for slow operations

### Quality Assertions
- Test both success and error scenarios
- Validate response coherence and relevance
- Include edge case testing (empty inputs, large payloads)

## Reference Documentation

For comprehensive testing guides and API references, see the included reference documents:

- **[Testing Dependencies](references/testing-dependencies.md)** - Maven/Gradle configuration and setup
- **[Unit Testing](references/unit-testing.md)** - Mock models, guardrails, and individual components
- **[Integration Testing](references/integration-testing.md)** - Testcontainers and real service testing
- **[Advanced Testing](references/advanced-testing.md)** - Streaming, memory, and error handling
- **[Workflow Patterns](references/workflow-patterns.md)** - Test pyramid and best practices

## Common Patterns

### Mock Strategy
// For fast unit tests
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(anyString())).thenReturn(Response.from(AiMessage.from("Mocked")));

// For specific responses
when(mockModel.generate(eq("Hello"))).thenReturn(Response.from(AiMessage.from("Hi")));
when(mockModel.generate(contains("Java"))).thenReturn(Response.from(AiMessage.from("Java response")));


### Test Configuration
// Use test-specific profiles
@TestPropertySource(properties = {
"langchain4j.ollama.base-url=http://localhost:11434"
})
class TestConfig {
// Test with isolated configuration
}


### Assertion Helpers
// Custom assertions for AI responses
assertThat(response).isNotNull().isNotEmpty();
assertThat(response).containsAll(expectedKeywords);
assertThat(response).doesNotContain("error");


## Performance Requirements

- **Unit Tests**: < 50ms per test
- **Integration Tests**: Use container reuse for faster startup
- **Timeout Tests**: Include @Timeout for external service calls
- **Memory Management**: Test conversation window limits and cleanup

## Security Considerations

- Never use real API keys in tests
- Mock external API calls completely
- Test prompt injection detection
- Validate output sanitization

## Testing Pyramid Implementation

70% Unit Tests
β”œβ”€ Business logic validation
β”œβ”€ Guardrail testing
β”œβ”€ Mock tool execution
└─ Edge case handling

20% Integration Tests
β”œβ”€ Testcontainers with Ollama
β”œβ”€ Vector store testing
β”œβ”€ RAG workflow validation
└─ Performance benchmarking

10% End-to-End Tests
β”œβ”€ Complete user journeys
β”œβ”€ Real model interactions
└─ Performance under load


## Related Skills

- spring-boot-test-patterns
- unit-test-service-layer
- unit-test-boundary-conditions

## References
- [Testing Dependencies](references/testing-dependencies.md)
- [Unit Testing](references/unit-testing.md)
- [Integration Testing](references/integration-testing.md)
- [Advanced Testing](references/advanced-testing.md)
- [Workflow Patterns](references/workflow-patterns.md)

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/langchain4j-testing-strategies/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-testing-strategies/SKILL.md
  • Cursor: ~/.cursor/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-testing-strategies/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-testing-strategies/SKILL.md

πŸš€ Install with CLI:
npx skills add giuseppe-trisciuoglio/developer-kit

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid testing & quality assurance issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under Testing & Quality Assurance and is published by Giuseppe Trisciuoglio, maintained in giuseppe-trisciuoglio/developer-kit.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.