Back to AI Tools & Agents

hybrid-search-implementation

hybrid searchvector searchkeyword searchRAGinformation retrievalsearch engineAIsemantic search
⭐ 36.8kπŸ“„ MITπŸ•’ 2026-06-16Source β†—

Install this skill

npx skills add wshobson/agents

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

Hybrid search blends vector-based semantic retrieval with keyword-centric full-text matching to address the inherent weaknesses of each individual approach. While vector models effectively capture conceptual meaning, they often struggle with specific identifiers like serial numbers, product codes, or niche jargon. Conversely, keyword search misses context but excels at exact string matches. This implementation provides programmatic strategies to bridge these methods, specifically using Reciprocal Rank Fusion (RRF) and weighted linear interpolation. By normalizing disparate score distributions and merging ranked results, this skill improves retrieval precision in systems where users alternate between abstract intent and precise entity queries. It provides ready-to-run logic for standardizing retrieval pipelines, ensuring that query results remain relevant across both broad topical questions and granular technical lookups.

When to Use This Skill

  • β€’Technical documentation sites requiring both conceptual tutorials and specific error code lookup
  • β€’E-commerce product search that must handle brand names alongside descriptive user queries
  • β€’Legal or medical databases requiring high-precision keyword matching on legislation and statute identifiers
  • β€’Knowledge base systems where queries contain a mix of natural language questions and exact terminology

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œImplement hybrid search for my RAG system
  • β€œShow me how to combine vector and keyword search results
  • β€œCreate a Reciprocal Rank Fusion function in Python
  • β€œConfigure Postgres for hybrid search with vector and FTS
  • β€œBalance BM25 and vector similarity scores

Pro Tips

  • πŸ’‘Experiment with different fusion methods (RRF, Linear, Cross-encoder) and evaluate their impact on your specific dataset's recall and precision.
  • πŸ’‘Consider pre-filtering results from one search type before applying the second, especially for very large indexes, to improve efficiency.
  • πŸ’‘Regularly update your keyword index and vector embeddings to maintain relevance with evolving content and user queries.

What this skill does

  • β€’Execute Reciprocal Rank Fusion (RRF) to combine multiple ranked result lists
  • β€’Perform linear score normalization to bridge vector and BM25 search outputs
  • β€’Implement PostgreSQL-native hybrid search with pgvector and GIN indexes
  • β€’Calculate weighted fusion scores based on tunable alpha parameters
  • β€’Support complex filtering combined with semantic and lexical search

When not to use it

  • βœ•Applications with small datasets where a simple keyword or vector search is sufficient
  • βœ•Systems where latency budgets are too tight to allow for dual search execution and post-processing

Example workflow

  1. Generate embeddings for the query using an encoder model
  2. Perform parallel search operations: vector search for semantics and FTS for exact keywords
  3. Normalize the resulting score sets to a standard range between zero and one
  4. Apply RRF or linear weighting to unify the two sets of rankings
  5. Sort the final candidate list by the fused score
  6. Return the top-N results to the user interface

Prerequisites

  • –A vector database or search engine like PostgreSQL with pgvector
  • –An existing text embedding model
  • –A full-text search capability (e.g., BM25 or Postgres GIN indexes)

Pitfalls & limitations

  • !Complexity of tuning hyperparameters like the RRF constant 'k' or linear 'alpha' weights
  • !Computational overhead of running two search operations for every user query
  • !Difficulty in normalizing scores when different search engines report vastly different distributions

FAQ

Why is score normalization necessary?
Vector cosine similarity and keyword BM25 scores operate on different scales. Normalization maps both sets to a common range, preventing one search method from disproportionately dominating the final result.
Which fusion method is better: RRF or linear combination?
RRF is generally preferred when you lack ground truth data or want a robust, parameter-free approach. Linear combination is better when you have enough data to empirically tune the influence of each search type.
Does this require two separate databases?
No, you can perform both operations within a single PostgreSQL database by utilizing pgvector for embeddings and full-text search columns for keyword matching.

How it compares

Unlike manual filtering or a basic LLM prompt, this skill automates the mathematical merging of search ranks to remove the noise of semantic drift and lexical misses simultaneously.

Source & trust

⭐ 37k starsπŸ“„ MITπŸ•’ Updated 2026-06-16
πŸ“„ Full skill instructions β€” original source: wshobson/agents
# Hybrid Search Implementation

Patterns for combining vector similarity and keyword-based search.

## When to Use This Skill

- Building RAG systems with improved recall
- Combining semantic understanding with exact matching
- Handling queries with specific terms (names, codes)
- Improving search for domain-specific vocabulary
- When pure vector search misses keyword matches

## Core Concepts

### 1. Hybrid Search Architecture

Query β†’ ┬─► Vector Search ──► Candidates ─┐
β”‚ β”‚
└─► Keyword Search ─► Candidates ─┴─► Fusion ─► Results


### 2. Fusion Methods

| Method | Description | Best For |
| ----------------- | ------------------------ | --------------- |
| **RRF** | Reciprocal Rank Fusion | General purpose |
| **Linear** | Weighted sum of scores | Tunable balance |
| **Cross-encoder** | Rerank with neural model | Highest quality |
| **Cascade** | Filter then rerank | Efficiency |

## Templates

### Template 1: Reciprocal Rank Fusion

from typing import List, Dict, Tuple
from collections import defaultdict

def reciprocal_rank_fusion(
result_lists: List[List[Tuple[str, float]]],
k: int = 60,
weights: List[float] = None
) -> List[Tuple[str, float]]:
"""
Combine multiple ranked lists using RRF.

Args:
result_lists: List of (doc_id, score) tuples per search method
k: RRF constant (higher = more weight to lower ranks)
weights: Optional weights per result list

Returns:
Fused ranking as (doc_id, score) tuples
"""
if weights is None:
weights = [1.0] * len(result_lists)

scores = defaultdict(float)

for result_list, weight in zip(result_lists, weights):
for rank, (doc_id, _) in enumerate(result_list):
# RRF formula: 1 / (k + rank)
scores[doc_id] += weight * (1.0 / (k + rank + 1))

# Sort by fused score
return sorted(scores.items(), key=lambda x: x[1], reverse=True)


def linear_combination(
vector_results: List[Tuple[str, float]],
keyword_results: List[Tuple[str, float]],
alpha: float = 0.5
) -> List[Tuple[str, float]]:
"""
Combine results with linear interpolation.

Args:
vector_results: (doc_id, similarity_score) from vector search
keyword_results: (doc_id, bm25_score) from keyword search
alpha: Weight for vector search (1-alpha for keyword)
"""
# Normalize scores to [0, 1]
def normalize(results):
if not results:
return {}
scores = [s for _, s in results]
min_s, max_s = min(scores), max(scores)
range_s = max_s - min_s if max_s != min_s else 1
return {doc_id: (score - min_s) / range_s for doc_id, score in results}

vector_scores = normalize(vector_results)
keyword_scores = normalize(keyword_results)

# Combine
all_docs = set(vector_scores.keys()) | set(keyword_scores.keys())
combined = {}

for doc_id in all_docs:
v_score = vector_scores.get(doc_id, 0)
k_score = keyword_scores.get(doc_id, 0)
combined[doc_id] = alpha * v_score + (1 - alpha) * k_score

return sorted(combined.items(), key=lambda x: x[1], reverse=True)


### Template 2: PostgreSQL Hybrid Search

import asyncpg
from typing import List, Dict, Optional
import numpy as np

class PostgresHybridSearch:
"""Hybrid search with pgvector and full-text search."""

def __init__(self, pool: asyncpg.Pool):
self.pool = pool

async def setup_schema(self):
"""Create tables and indexes."""
async with self.pool.acquire() as conn:
await conn.execute("""
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS documents (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
ts_content tsvector GENERATED ALWAYS AS (
to_tsvector('english', content)
) STORED
);

-- Vector index (HNSW)
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents USING hnsw (embedding vector_cosine_ops);

-- Full-text index (GIN)
CREATE INDEX IF NOT EXISTS documents_fts_idx
ON documents USING gin (ts_content);
""")

async def hybrid_search(
self,
query: str,
query_embedding: List[float],
limit: int = 10,
vector_weight: float = 0.5,
filter_metadata: Optional[Dict] = None
) -> List[Dict]:
"""
Perform hybrid search combining vector and full-text.

Uses RRF fusion for combining results.
"""
async with self.pool.acquire() as conn:
# Build filter clause
where_clause = "1=1"
params = [query_embedding, query, limit * 3]

if filter_metadata:
for key, value in filter_metadata.items():
params.append(value)
where_clause += f" AND metadata->>'{key}' = ${len(params)}"

results = await conn.fetch(f"""
WITH vector_search AS (
SELECT
id,
content,
metadata,
ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) as vector_rank,
1 - (embedding <=> $1::vector) as vector_score
FROM documents
WHERE {where_clause}
ORDER BY embedding <=> $1::vector
LIMIT $3
),
keyword_search AS (
SELECT
id,
content,
metadata,
ROW_NUMBER() OVER (ORDER BY ts_rank(ts_content, websearch_to_tsquery('english', $2)) DESC) as keyword_rank,
ts_rank(ts_content, websearch_to_tsquery('english', $2)) as keyword_score
FROM documents
WHERE ts_content @@ websearch_to_tsquery('english', $2)
AND {where_clause}
ORDER BY ts_rank(ts_content, websearch_to_tsquery('english', $2)) DESC
LIMIT $3
)
SELECT
COALESCE(v.id, k.id) as id,
COALESCE(v.content, k.content) as content,
COALESCE(v.metadata, k.metadata) as metadata,
v.vector_score,
k.keyword_score,
-- RRF fusion
COALESCE(1.0 / (60 + v.vector_rank), 0) * $4::float +
COALESCE(1.0 / (60 + k.keyword_rank), 0) * (1 - $4::float) as rrf_score
FROM vector_search v
FULL OUTER JOIN keyword_search k ON v.id = k.id
ORDER BY rrf_score DESC
LIMIT $3 / 3
""", *params, vector_weight)

return [dict(row) for row in results]

async def search_with_rerank(
self,
query: str,
query_embedding: List[float],
limit: int = 10,
rerank_candidates: int = 50
) -> List[Dict]:
"""Hybrid search with cross-encoder reranking."""
from sentence_transformers import CrossEncoder

# Get candidates
candidates = await self.hybrid_search(
query, query_embedding, limit=rerank_candidates
)

if not candidates:
return []

# Rerank with cross-encoder
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

pairs = [(query, c["content"]) for c in candidates]
scores = model.predict(pairs)

for candidate, score in zip(candidates, scores):
candidate["rerank_score"] = float(score)

# Sort by rerank score and return top results
reranked = sorted(candidates, key=lambda x: x["rerank_score"], reverse=True)
return reranked[:limit]


### Template 3: Elasticsearch Hybrid Search

from elasticsearch import Elasticsearch
from typing import List, Dict, Optional

class ElasticsearchHybridSearch:
"""Hybrid search with Elasticsearch and dense vectors."""

def __init__(
self,
es_client: Elasticsearch,
index_name: str = "documents"
):
self.es = es_client
self.index_name = index_name

def create_index(self, vector_dims: int = 1536):
"""Create index with dense vector and text fields."""
mapping = {
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "english"
},
"embedding": {
"type": "dense_vector",
"dims": vector_dims,
"index": True,
"similarity": "cosine"
},
"metadata": {
"type": "object",
"enabled": True
}
}
}
}
self.es.indices.create(index=self.index_name, body=mapping, ignore=400)

def hybrid_search(
self,
query: str,
query_embedding: List[float],
limit: int = 10,
boost_vector: float = 1.0,
boost_text: float = 1.0,
filter: Optional[Dict] = None
) -> List[Dict]:
"""
Hybrid search using Elasticsearch's built-in capabilities.
"""
# Build the hybrid query
search_body = {
"size": limit,
"query": {
"bool": {
"should": [
# Vector search (kNN)
{
"script_score": {
"query": {"match_all": {}},
"script": {
"source": f"cosineSimilarity(params.query_vector, 'embedding') * {boost_vector} + 1.0",
"params": {"query_vector": query_embedding}
}
}
},
# Text search (BM25)
{
"match": {
"content": {
"query": query,
"boost": boost_text
}
}
}
],
"minimum_should_match": 1
}
}
}

# Add filter if provided
if filter:
search_body["query"]["bool"]["filter"] = filter

response = self.es.search(index=self.index_name, body=search_body)

return [
{
"id": hit["_id"],
"content": hit["_source"]["content"],
"metadata": hit["_source"].get("metadata", {}),
"score": hit["_score"]
}
for hit in response["hits"]["hits"]
]

def hybrid_search_rrf(
self,
query: str,
query_embedding: List[float],
limit: int = 10,
window_size: int = 100
) -> List[Dict]:
"""
Hybrid search using Elasticsearch 8.x RRF.
"""
search_body = {
"size": limit,
"sub_searches": [
{
"query": {
"match": {
"content": query
}
}
},
{
"query": {
"knn": {
"field": "embedding",
"query_vector": query_embedding,
"k": window_size,
"num_candidates": window_size * 2
}
}
}
],
"rank": {
"rrf": {
"window_size": window_size,
"rank_constant": 60
}
}
}

response = self.es.search(index=self.index_name, body=search_body)

return [
{
"id": hit["_id"],
"content": hit["_source"]["content"],
"score": hit["_score"]
}
for hit in response["hits"]["hits"]
]


### Template 4: Custom Hybrid RAG Pipeline

from typing import List, Dict, Optional, Callable
from dataclasses import dataclass

@dataclass
class SearchResult:
id: str
content: str
score: float
source: str # "vector", "keyword", "hybrid"
metadata: Dict = None


class HybridRAGPipeline:
"""Complete hybrid search pipeline for RAG."""

def __init__(
self,
vector_store,
keyword_store,
embedder,
reranker=None,
fusion_method: str = "rrf",
vector_weight: float = 0.5
):
self.vector_store = vector_store
self.keyword_store = keyword_store
self.embedder = embedder
self.reranker = reranker
self.fusion_method = fusion_method
self.vector_weight = vector_weight

async def search(
self,
query: str,
top_k: int = 10,
filter: Optional[Dict] = None,
use_rerank: bool = True
) -> List[SearchResult]:
"""Execute hybrid search pipeline."""

# Step 1: Get query embedding
query_embedding = self.embedder.embed(query)

# Step 2: Execute parallel searches
vector_results, keyword_results = await asyncio.gather(
self._vector_search(query_embedding, top_k * 3, filter),
self._keyword_search(query, top_k * 3, filter)
)

# Step 3: Fuse results
if self.fusion_method == "rrf":
fused = self._rrf_fusion(vector_results, keyword_results)
else:
fused = self._linear_fusion(vector_results, keyword_results)

# Step 4: Rerank if enabled
if use_rerank and self.reranker:
fused = await self._rerank(query, fused[:top_k * 2])

return fused[:top_k]

async def _vector_search(
self,
embedding: List[float],
limit: int,
filter: Dict
) -> List[SearchResult]:
results = await self.vector_store.search(embedding, limit, filter)
return [
SearchResult(
id=r["id"],
content=r["content"],
score=r["score"],
source="vector",
metadata=r.get("metadata")
)
for r in results
]

async def _keyword_search(
self,
query: str,
limit: int,
filter: Dict
) -> List[SearchResult]:
results = await self.keyword_store.search(query, limit, filter)
return [
SearchResult(
id=r["id"],
content=r["content"],
score=r["score"],
source="keyword",
metadata=r.get("metadata")
)
for r in results
]

def _rrf_fusion(
self,
vector_results: List[SearchResult],
keyword_results: List[SearchResult]
) -> List[SearchResult]:
"""Fuse with RRF."""
k = 60
scores = {}
content_map = {}

for rank, result in enumerate(vector_results):
scores[result.id] = scores.get(result.id, 0) + 1 / (k + rank + 1)
content_map[result.id] = result

for rank, result in enumerate(keyword_results):
scores[result.id] = scores.get(result.id, 0) + 1 / (k + rank + 1)
if result.id not in content_map:
content_map[result.id] = result

sorted_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)

return [
SearchResult(
id=doc_id,
content=content_map[doc_id].content,
score=scores[doc_id],
source="hybrid",
metadata=content_map[doc_id].metadata
)
for doc_id in sorted_ids
]

async def _rerank(
self,
query: str,
results: List[SearchResult]
) -> List[SearchResult]:
"""Rerank with cross-encoder."""
if not results:
return results

pairs = [(query, r.content) for r in results]
scores = self.reranker.predict(pairs)

for result, score in zip(results, scores):
result.score = float(score)

return sorted(results, key=lambda x: x.score, reverse=True)


## Best Practices

### Do's

- **Tune weights empirically** - Test on your data
- **Use RRF for simplicity** - Works well without tuning
- **Add reranking** - Significant quality improvement
- **Log both scores** - Helps with debugging
- **A/B test** - Measure real user impact

### Don'ts

- **Don't assume one size fits all** - Different queries need different weights
- **Don't skip keyword search** - Handles exact matches better
- **Don't over-fetch** - Balance recall vs latency
- **Don't ignore edge cases** - Empty results, single word queries

## Resources

- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)
- [Vespa Hybrid Search](https://blog.vespa.ai/improving-text-ranking-with-few-shot-prompting/)
- [Cohere Rerank](https://docs.cohere.com/docs/reranking)

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/hybrid-search-implementation/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/wshobson/agents/hybrid-search-implementation/SKILL.md
  • Cursor: ~/.cursor/skills/wshobson/agents/hybrid-search-implementation/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/wshobson/agents/hybrid-search-implementation/SKILL.md

πŸš€ Install with CLI:
npx skills add wshobson/agents

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid ai tools & agents issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under AI Tools & Agents and is published by W. Shobson, maintained in wshobson/agents.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.