langchain4j-testing-strategies

Name: langchain4j-testing-strategies
Author: Giuseppe Trisciuoglio

LangChain4JAI testingLLM testingunit testingintegration testingRAGTestcontainersJava AI

⭐ 282📄 MIT🕒 2026-06-15Source ↗

Install this skill

npx skills add giuseppe-trisciuoglio/developer-kit

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

LangChain4J testing strategies provide a structured framework for validating AI-integrated Java applications. This approach balances rapid feedback through unit tests with high-fidelity validation using Testcontainers for real-world interactions. Developers can isolate business logic by mocking chat and embedding models, ensuring that service configurations and AI pipelines function reliably without incurring API costs or latency during builds. The strategy emphasizes a testing pyramid, prioritizing fast execution while maintaining oversight of complex flows like RAG systems, tool invocation, and streaming response patterns. By configuring specific test dependencies and maintaining strict isolation between test cases, teams ensure that prompt changes, dependency updates, and internal logic refinements do not break existing LLM-powered workflows. This methodology specifically addresses the unique non-deterministic nature of AI outputs within standard Java enterprise testing environments.

When to Use This Skill

•Verifying that an AiService correctly maps a natural language query to a specific Java tool function
•Ensuring RAG retrieval logic correctly filters and embeds context from local document stores
•Validating exception handling when an LLM provider returns malformed JSON or times out
•Regression testing prompt changes to ensure output structure remains consistent for downstream processing

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

“how to mock chat models in LangChain4J
“setup Testcontainers for Ollama in Java
“write unit tests for LangChain4J AiServices
“best practices for testing RAG pipelines in Java
“how to verify tool execution in LangChain4J tests

Pro Tips

💡Combine mock models for rapid unit testing of business logic with Testcontainers for realistic, isolated integration testing of your full AI stack.
💡Prioritize testing guardrails and critical tool execution paths, as these are often where subtle failures can lead to significant issues.
💡Implement snapshot testing for LLM outputs in key flows to detect unexpected changes or regressions after model updates or prompt modifications.

What this skill does

•Isolation of LLM logic via Mockito-based model stubs
•Containerized integration testing for local model execution using Ollama or similar engines
•Validation of Retrieval-Augmented Generation (RAG) pipelines and retrieval accuracy
•Assertion testing for tool-calling and function invocation logic
•Performance measurement for streaming responses and latency thresholds

When not to use it

✕Testing raw model accuracy or training data quality of a commercial foundation model
✕Performance benchmarking of third-party cloud-based LLM APIs that do not support test keys
✕Evaluating non-Java AI workflows that do not integrate with the LangChain4J library

Example workflow

Configure project dependencies to include langchain4j-test and Testcontainers
Implement unit tests using Mockito to verify basic AiService routing and logic
Create containerized integration tests using Testcontainers for local model interactions
Define test data scenarios including edge cases for empty inputs or malformed prompts
Execute the test suite during CI/CD to monitor for regressions in AI pipeline behavior

Prerequisites

–LangChain4J core library installation
–Docker installed for Testcontainers usage
–JUnit 5 testing framework
–Knowledge of Mockito library

Pitfalls & limitations

!Over-relying on real LLM calls in integration tests leads to slow builds and high API expenses
!Failing to reset state in global stores between tests results in non-deterministic flakiness
!Underestimating the latency of spinning up containers, which can inflate feedback loops if not managed properly

FAQ

Should I use real API keys in my integration tests?

No. Use Testcontainers for local models or mock services to keep tests isolated, deterministic, and free of security risks.

How do I test streaming responses?

You should use specialized test handlers provided by the library to verify that chunks are emitted and aggregated correctly by your service.

Why do my tests fail when I change the prompt?

LLM outputs are sensitive to prompts; verify your assertions target the functional outcome or specific data format rather than exact string matches.

How can I speed up tests that use containers?

Use container reuse features in Testcontainers to prevent restarting the service for every individual test method.

How it compares

Unlike manual ad-hoc testing, this framework provides a formal, repeatable path that treats AI behavior as testable code, reducing the ambiguity typically associated with generative outputs.

Source & trust

⭐ 282 stars📄 MIT🕒 Updated 2026-06-15

View original skill on GitHub →

📄 Full skill instructions — original source: giuseppe-trisciuoglio/developer-kit

# LangChain4J Testing Strategies

## When to Use This Skill

Use this skill when:
- Building AI-powered applications with LangChain4J
- Writing unit tests for AI services and guardrails
- Setting up integration tests with real LLM models
- Creating mock-based tests for faster test execution
- Using Testcontainers for isolated testing environments
- Testing RAG (Retrieval-Augmented Generation) systems
- Validating tool execution and function calling
- Testing streaming responses and async operations
- Setting up end-to-end tests for AI workflows
- Implementing performance and load testing

## Instructions

To test LangChain4J applications effectively, follow these key strategies:

### 1. Start with Unit Testing

Use mock models for fast, isolated testing of business logic. See references/unit-testing.md for detailed examples.

// Example: Mock ChatModel for unit tests
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(any(String.class)))
    .thenReturn(Response.from(AiMessage.from("Mocked response")));

var service = AiServices.builder(AiService.class)
        .chatModel(mockModel)
        .build();

### 2. Configure Testing Dependencies

Setup proper Maven/Gradle dependencies for testing. See references/testing-dependencies.md for complete configuration.

**Key dependencies**:
- langchain4j-test - Testing utilities and guardrail assertions
- testcontainers - Integration testing with containerized services
- mockito - Mock external dependencies
- assertj - Better assertions

### 3. Implement Integration Tests

Test with real services using Testcontainers. See references/integration-testing.md for container setup examples.

@Testcontainers
class OllamaIntegrationTest {
    @Container
    static GenericContainer<?> ollama = new GenericContainer<>(
        DockerImageName.parse("ollama/ollama:latest")
    ).withExposedPorts(11434);

    @Test
    void shouldGenerateResponse() {
        ChatModel model = OllamaChatModel.builder()
                .baseUrl(ollama.getEndpoint())
                .build();
        String response = model.generate("Test query");
        assertNotNull(response);
    }
}

### 4. Test Advanced Features

For streaming responses, memory management, and complex workflows, refer to references/advanced-testing.md.

### 5. Apply Testing Workflows

Follow testing pyramid patterns and best practices from references/workflow-patterns.md.

- **70% Unit Tests**: Fast, isolated business logic testing
- **20% Integration Tests**: Real service interactions
- **10% End-to-End Tests**: Complete user workflows

## Examples

### Basic Unit Test

@Test
void shouldProcessQueryWithMock() {
    ChatModel mockModel = mock(ChatModel.class);
    when(mockModel.generate(any(String.class)))
        .thenReturn(Response.from(AiMessage.from("Test response")));

    var service = AiServices.builder(AiService.class)
            .chatModel(mockModel)
            .build();

    String result = service.chat("What is Java?");
    assertEquals("Test response", result);
}

### Integration Test with Testcontainers

@Testcontainers
class RAGIntegrationTest {
    @Container
    static GenericContainer<?> ollama = new GenericContainer<>(
        DockerImageName.parse("ollama/ollama:latest")
    );

    @Test
    void shouldCompleteRAGWorkflow() {
        // Setup models and stores
        var chatModel = OllamaChatModel.builder()
                .baseUrl(ollama.getEndpoint())
                .build();

        var embeddingModel = OllamaEmbeddingModel.builder()
                .baseUrl(ollama.getEndpoint())
                .build();

        var store = new InMemoryEmbeddingStore<>();
        var retriever = EmbeddingStoreContentRetriever.builder()
                .chatModel(chatModel)
                .embeddingStore(store)
                .embeddingModel(embeddingModel)
                .build();

        // Test complete workflow
        var assistant = AiServices.builder(RagAssistant.class)
                .chatLanguageModel(chatModel)
                .contentRetriever(retriever)
                .build();

        String response = assistant.chat("What is Spring Boot?");
        assertNotNull(response);
        assertTrue(response.contains("Spring"));
    }
}

## Best Practices

### Test Isolation
- Each test must be independent
- Use @BeforeEach and @AfterEach for setup/teardown
- Avoid sharing state between tests

### Mock External Dependencies
- Never call real APIs in unit tests
- Use mocks for ChatModel, EmbeddingModel, and external services
- Test error handling scenarios

### Performance Considerations
- Unit tests should run in < 50ms
- Integration tests should use container reuse
- Include timeout assertions for slow operations

### Quality Assertions
- Test both success and error scenarios
- Validate response coherence and relevance
- Include edge case testing (empty inputs, large payloads)

## Reference Documentation

For comprehensive testing guides and API references, see the included reference documents:

- **[Testing Dependencies](references/testing-dependencies.md)** - Maven/Gradle configuration and setup
- **[Unit Testing](references/unit-testing.md)** - Mock models, guardrails, and individual components
- **[Integration Testing](references/integration-testing.md)** - Testcontainers and real service testing
- **[Advanced Testing](references/advanced-testing.md)** - Streaming, memory, and error handling
- **[Workflow Patterns](references/workflow-patterns.md)** - Test pyramid and best practices

## Common Patterns

### Mock Strategy

// For fast unit tests
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(anyString())).thenReturn(Response.from(AiMessage.from("Mocked")));

// For specific responses
when(mockModel.generate(eq("Hello"))).thenReturn(Response.from(AiMessage.from("Hi")));
when(mockModel.generate(contains("Java"))).thenReturn(Response.from(AiMessage.from("Java response")));

### Test Configuration

// Use test-specific profiles
@TestPropertySource(properties = {
    "langchain4j.ollama.base-url=http://localhost:11434"
})
class TestConfig {
    // Test with isolated configuration
}

### Assertion Helpers

// Custom assertions for AI responses
assertThat(response).isNotNull().isNotEmpty();
assertThat(response).containsAll(expectedKeywords);
assertThat(response).doesNotContain("error");

## Performance Requirements

- **Unit Tests**: < 50ms per test
- **Integration Tests**: Use container reuse for faster startup
- **Timeout Tests**: Include @Timeout for external service calls
- **Memory Management**: Test conversation window limits and cleanup

## Security Considerations

- Never use real API keys in tests
- Mock external API calls completely
- Test prompt injection detection
- Validate output sanitization

## Testing Pyramid Implementation

70% Unit Tests
  ├─ Business logic validation
  ├─ Guardrail testing
  ├─ Mock tool execution
  └─ Edge case handling

20% Integration Tests
  ├─ Testcontainers with Ollama
  ├─ Vector store testing
  ├─ RAG workflow validation
  └─ Performance benchmarking

10% End-to-End Tests
  ├─ Complete user journeys
  ├─ Real model interactions
  └─ Performance under load

## Related Skills

- spring-boot-test-patterns
- unit-test-service-layer
- unit-test-boundary-conditions

## References
- [Testing Dependencies](references/testing-dependencies.md)
- [Unit Testing](references/unit-testing.md)
- [Integration Testing](references/integration-testing.md)
- [Advanced Testing](references/advanced-testing.md)
- [Workflow Patterns](references/workflow-patterns.md)

By Giuseppe Trisciuoglio

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Click "Download" above
In your project, create the directory: .agent/skills/langchain4j-testing-strategies/
Save the file as SKILL.md
The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

Claude Code: ~/.claude/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-testing-strategies/SKILL.md
Cursor: ~/.cursor/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-testing-strategies/SKILL.md
Antigravity: ~/.gemini/antigravity/skills/giuseppe-trisciuoglio/developer-kit/langchain4j-testing-strategies/SKILL.md

🚀 Install with CLI:
npx skills add giuseppe-trisciuoglio/developer-kit

Read the Master Guide: Mastering Agent Skills →

Recommended Rules

View more rules →

Recommended Workflows

View more workflows →

Implement Blue-Green Deployment

DeploymentDevOpsZero-Downtime

--- description: Zero-downtime deploys --- 1. **Setup Two Environments**: - Blue: Current (v1.0) - Green: New (v1.1) 2. **Route Traffic Gradua...

React Performance Profiling

ReactPerformanceDebugging

--- description: Identify slow components using React Profiler --- 1. **Install DevTools**: - Install React Developer Tools extension for Chrome/F...

Accessibility (a11y) Audit

AccessibilityTestingQuality

--- description: Find and fix accessibility violations --- 1. **Install axe-core**: - Use the CLI tool for quick audits. // turbo - Run `npm...

Recommended MCP Servers

View more MCP servers →

ActionKit by Paragon

Official

Connect to 130+ SaaS integrations (e.g. Slack, Salesforce, Gmail) with Paragon’s [ActionKit](https://www.useparagon.com/actionkit) API.

Agentset

Official

RAG for your knowledge base connected to [Agentset](https://agentset.ai).

APIMatic MCP

Official

APIMatic MCP Server is used to validate OpenAPI specifications using [APIMatic](https://www.apimatic.io/). The server processes OpenAPI files and returns validation summaries by leveraging APIMatic's API.

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library

📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

langchain4j-testing-strategies

Install this skill

When to Use This Skill

How to Invoke This Skill

Pro Tips

What this skill does

When not to use it

Example workflow

Prerequisites

Pitfalls & limitations

FAQ

How it compares

Source & trust

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Option B: Global Installation (All Agents)

Recommended Rules

🧪 Test Writing Agent - Strategic Test Coverage

Unit Testing Best Practices

Integration Testing Strategies

Recommended Workflows

Implement Blue-Green Deployment

React Performance Profiling

Accessibility (a11y) Audit

Recommended MCP Servers

ActionKit by Paragon

Agentset

APIMatic MCP

Take It Further

Define Your Standards

Master Workflows

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

For Cursor & Windsurf

Source & attribution