Back to AI Tools & Agents

similarity-search-patterns

vector searchsemantic searchnearest neighborRAGembeddingAIdatabaseretrieval
⭐ 36.8kπŸ“„ MITπŸ•’ 2026-06-16Source β†—

Install this skill

npx skills add wshobson/agents

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

Similarity search patterns bridge the gap between unstructured data and structured retrieval by transforming information into vector embeddings. This library provides architectural guidance and implementation templates for managing high-dimensional data in production vector databases. It focuses on selecting appropriate distance metricsβ€”such as Cosine, L2, or Dot Productβ€”based on embedding models and choosing indexing structures like HNSW for performance or Flat indices for accuracy. Beyond basic CRUD operations, the framework includes patterns for hybrid retrieval, such as combining keyword search with semantic density and applying reranking pipelines to improve result precision. By structuring vector ingestion, search, and deletion into modular classes for providers like Pinecone and Qdrant, developers can maintain efficient, scalable search interfaces that handle millions of vector entries without sacrificing query latency.

When to Use This Skill

  • β€’Building RAG systems requiring high-precision context retrieval
  • β€’Developing recommendation engines that map user history to product vectors
  • β€’Scaling semantic search systems that exceed local memory constraints
  • β€’Merging metadata filtering with vector proximity searches for complex queries

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œhow do I implement vector search for my RAG application
  • β€œwhich index type should I use for large-scale similarity search
  • β€œhow to integrate cross-encoder reranking in pinecone
  • β€œprovide a pattern for managing vector upserts in production
  • β€œcompare cosine and euclidean distance for embeddings

Pro Tips

  • πŸ’‘Always normalize your embeddings before using cosine similarity; it's designed for direction, not magnitude. For Euclidean distance, raw embeddings are typically fine.
  • πŸ’‘Start with Flat (Exact) indices for small datasets or during initial development, then migrate to HNSW or IVF+PQ as your vector count scales into millions for better performance.
  • πŸ’‘Carefully evaluate your choice of distance metric and index type against your specific dataset characteristics and performance requirements (recall vs. latency) using clear benchmarks.

What this skill does

  • β€’Implementation of vector store abstraction layers for Pinecone and Qdrant
  • β€’Support for multiple distance metrics including Cosine, Euclidean, and Dot Product
  • β€’Automated batch processing for large-scale upserts and deletions
  • β€’Integrated reranking pipelines using Cross-Encoders for improved search relevance
  • β€’Configurable indexing selection for balancing recall against search speed

When not to use it

  • βœ•Handling small datasets where simple string matching or regex is sufficient
  • βœ•Applications where absolute 100% recall is required on massive, high-dimensional datasets

Example workflow

  1. Initialize the vector store client with specific dimension and metric settings
  2. Upsert vectorized document data in batches to ensure network stability
  3. Execute a proximity search using a query vector to fetch candidate results
  4. Pass the candidate results to a reranker model for semantic score calculation
  5. Sort and filter the final output based on the adjusted rerank scores

Prerequisites

  • –Vectorized data or an embedding model (e.g., OpenAI, SentenceTransformers)
  • –API credentials for a supported vector database (Pinecone or Qdrant)
  • –Python environment with relevant database client libraries

Pitfalls & limitations

  • !Choosing an inappropriate index type leading to excessive latency or poor recall
  • !Failing to normalize vectors when using metrics sensitive to magnitude
  • !Over-fetching results for reranking which can significantly increase latency

FAQ

Why use reranking after a vector search?
Vector search identifies general semantic relevance, but Cross-Encoders provide deeper contextual analysis of the relationship between query and result, significantly boosting accuracy.
Should I use HNSW or IVF indices?
Use HNSW for better performance on large datasets when you need a balance of speed and recall, while IVF is often better for massive datasets requiring higher quantization.
Does this library support filtering?
Yes, the templates support metadata-based filtering which can be applied during the search query to constrain results by specific properties.

How it compares

Generic prompts often struggle with the state management and batching logic required for production vector databases; this library offers formal class structures to ensure consistent index management.

Source & trust

⭐ 37k starsπŸ“„ MITπŸ•’ Updated 2026-06-16
πŸ“„ Full skill instructions β€” original source: wshobson/agents
# Similarity Search Patterns

Patterns for implementing efficient similarity search in production systems.

## When to Use This Skill

- Building semantic search systems
- Implementing RAG retrieval
- Creating recommendation engines
- Optimizing search latency
- Scaling to millions of vectors
- Combining semantic and keyword search

## Core Concepts

### 1. Distance Metrics

| Metric | Formula | Best For |
| ------------------ | ------------------ | --------------------- | --- | -------------- |
| **Cosine** | 1 - (AΒ·B)/(β€–Aβ€–β€–Bβ€–) | Normalized embeddings |
| **Euclidean (L2)** | √Σ(a-b)² | Raw embeddings |
| **Dot Product** | AΒ·B | Magnitude matters |
| **Manhattan (L1)** | Ξ£ | a-b | | Sparse vectors |

### 2. Index Types

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Index Types β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Flat β”‚ HNSW β”‚ IVF+PQ β”‚
β”‚ (Exact) β”‚ (Graph-based) β”‚ (Quantized) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ O(n) search β”‚ O(log n) β”‚ O(√n) β”‚
β”‚ 100% recall β”‚ ~95-99% β”‚ ~90-95% β”‚
β”‚ Small data β”‚ Medium-Large β”‚ Very Large β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


## Templates

### Template 1: Pinecone Implementation

from pinecone import Pinecone, ServerlessSpec
from typing import List, Dict, Optional
import hashlib

class PineconeVectorStore:
def __init__(
self,
api_key: str,
index_name: str,
dimension: int = 1536,
metric: str = "cosine"
):
self.pc = Pinecone(api_key=api_key)

# Create index if not exists
if index_name not in self.pc.list_indexes().names():
self.pc.create_index(
name=index_name,
dimension=dimension,
metric=metric,
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

self.index = self.pc.Index(index_name)

def upsert(
self,
vectors: List[Dict],
namespace: str = ""
) -> int:
"""
Upsert vectors.
vectors: [{"id": str, "values": List[float], "metadata": dict}]
"""
# Batch upsert
batch_size = 100
total = 0

for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size]
self.index.upsert(vectors=batch, namespace=namespace)
total += len(batch)

return total

def search(
self,
query_vector: List[float],
top_k: int = 10,
namespace: str = "",
filter: Optional[Dict] = None,
include_metadata: bool = True
) -> List[Dict]:
"""Search for similar vectors."""
results = self.index.query(
vector=query_vector,
top_k=top_k,
namespace=namespace,
filter=filter,
include_metadata=include_metadata
)

return [
{
"id": match.id,
"score": match.score,
"metadata": match.metadata
}
for match in results.matches
]

def search_with_rerank(
self,
query: str,
query_vector: List[float],
top_k: int = 10,
rerank_top_n: int = 50,
namespace: str = ""
) -> List[Dict]:
"""Search and rerank results."""
# Over-fetch for reranking
initial_results = self.search(
query_vector,
top_k=rerank_top_n,
namespace=namespace
)

# Rerank with cross-encoder or LLM
reranked = self._rerank(query, initial_results)

return reranked[:top_k]

def _rerank(self, query: str, results: List[Dict]) -> List[Dict]:
"""Rerank results using cross-encoder."""
from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

pairs = [(query, r["metadata"]["text"]) for r in results]
scores = model.predict(pairs)

for result, score in zip(results, scores):
result["rerank_score"] = float(score)

return sorted(results, key=lambda x: x["rerank_score"], reverse=True)

def delete(self, ids: List[str], namespace: str = ""):
"""Delete vectors by ID."""
self.index.delete(ids=ids, namespace=namespace)

def delete_by_filter(self, filter: Dict, namespace: str = ""):
"""Delete vectors matching filter."""
self.index.delete(filter=filter, namespace=namespace)


### Template 2: Qdrant Implementation

from qdrant_client import QdrantClient
from qdrant_client.http import models
from typing import List, Dict, Optional

class QdrantVectorStore:
def __init__(
self,
url: str = "localhost",
port: int = 6333,
collection_name: str = "documents",
vector_size: int = 1536
):
self.client = QdrantClient(url=url, port=port)
self.collection_name = collection_name

# Create collection if not exists
collections = self.client.get_collections().collections
if collection_name not in [c.name for c in collections]:
self.client.create_collection(
collection_name=collection_name,
vectors_config=models.VectorParams(
size=vector_size,
distance=models.Distance.COSINE
),
# Optional: enable quantization for memory efficiency
quantization_config=models.ScalarQuantization(
scalar=models.ScalarQuantizationConfig(
type=models.ScalarType.INT8,
quantile=0.99,
always_ram=True
)
)
)

def upsert(self, points: List[Dict]) -> int:
"""
Upsert points.
points: [{"id": str/int, "vector": List[float], "payload": dict}]
"""
qdrant_points = [
models.PointStruct(
id=p["id"],
vector=p["vector"],
payload=p.get("payload", {})
)
for p in points
]

self.client.upsert(
collection_name=self.collection_name,
points=qdrant_points
)
return len(points)

def search(
self,
query_vector: List[float],
limit: int = 10,
filter: Optional[models.Filter] = None,
score_threshold: Optional[float] = None
) -> List[Dict]:
"""Search for similar vectors."""
results = self.client.search(
collection_name=self.collection_name,
query_vector=query_vector,
limit=limit,
query_filter=filter,
score_threshold=score_threshold
)

return [
{
"id": r.id,
"score": r.score,
"payload": r.payload
}
for r in results
]

def search_with_filter(
self,
query_vector: List[float],
must_conditions: List[Dict] = None,
should_conditions: List[Dict] = None,
must_not_conditions: List[Dict] = None,
limit: int = 10
) -> List[Dict]:
"""Search with complex filters."""
conditions = []

if must_conditions:
conditions.extend([
models.FieldCondition(
key=c["key"],
match=models.MatchValue(value=c["value"])
)
for c in must_conditions
])

filter = models.Filter(must=conditions) if conditions else None

return self.search(query_vector, limit=limit, filter=filter)

def search_with_sparse(
self,
dense_vector: List[float],
sparse_vector: Dict[int, float],
limit: int = 10,
dense_weight: float = 0.7
) -> List[Dict]:
"""Hybrid search with dense and sparse vectors."""
# Requires collection with named vectors
results = self.client.search(
collection_name=self.collection_name,
query_vector=models.NamedVector(
name="dense",
vector=dense_vector
),
limit=limit
)
return [{"id": r.id, "score": r.score, "payload": r.payload} for r in results]


### Template 3: pgvector with PostgreSQL

import asyncpg
from typing import List, Dict, Optional
import numpy as np

class PgVectorStore:
def __init__(self, connection_string: str):
self.connection_string = connection_string

async def init(self):
"""Initialize connection pool and extension."""
self.pool = await asyncpg.create_pool(self.connection_string)

async with self.pool.acquire() as conn:
# Enable extension
await conn.execute("CREATE EXTENSION IF NOT EXISTS vector")

# Create table
await conn.execute("""
CREATE TABLE IF NOT EXISTS documents (
id TEXT PRIMARY KEY,
content TEXT,
metadata JSONB,
embedding vector(1536)
)
""")

# Create index (HNSW for better performance)
await conn.execute("""
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64)
""")

async def upsert(self, documents: List[Dict]):
"""Upsert documents with embeddings."""
async with self.pool.acquire() as conn:
await conn.executemany(
"""
INSERT INTO documents (id, content, metadata, embedding)
VALUES ($1, $2, $3, $4)
ON CONFLICT (id) DO UPDATE SET
content = EXCLUDED.content,
metadata = EXCLUDED.metadata,
embedding = EXCLUDED.embedding
""",
[
(
doc["id"],
doc["content"],
doc.get("metadata", {}),
np.array(doc["embedding"]).tolist()
)
for doc in documents
]
)

async def search(
self,
query_embedding: List[float],
limit: int = 10,
filter_metadata: Optional[Dict] = None
) -> List[Dict]:
"""Search for similar documents."""
query = """
SELECT id, content, metadata,
1 - (embedding <=> $1::vector) as similarity
FROM documents
"""

params = [query_embedding]

if filter_metadata:
conditions = []
for key, value in filter_metadata.items():
params.append(value)
conditions.append(f"metadata->>'{key}' = ${len(params)}")
query += " WHERE " + " AND ".join(conditions)

query += f" ORDER BY embedding <=> $1::vector LIMIT ${len(params) + 1}"
params.append(limit)

async with self.pool.acquire() as conn:
rows = await conn.fetch(query, *params)

return [
{
"id": row["id"],
"content": row["content"],
"metadata": row["metadata"],
"score": row["similarity"]
}
for row in rows
]

async def hybrid_search(
self,
query_embedding: List[float],
query_text: str,
limit: int = 10,
vector_weight: float = 0.5
) -> List[Dict]:
"""Hybrid search combining vector and full-text."""
async with self.pool.acquire() as conn:
rows = await conn.fetch(
"""
WITH vector_results AS (
SELECT id, content, metadata,
1 - (embedding <=> $1::vector) as vector_score
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT $3 * 2
),
text_results AS (
SELECT id, content, metadata,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $2)) as text_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $2)
LIMIT $3 * 2
)
SELECT
COALESCE(v.id, t.id) as id,
COALESCE(v.content, t.content) as content,
COALESCE(v.metadata, t.metadata) as metadata,
COALESCE(v.vector_score, 0) * $4 +
COALESCE(t.text_score, 0) * (1 - $4) as combined_score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
ORDER BY combined_score DESC
LIMIT $3
""",
query_embedding, query_text, limit, vector_weight
)

return [dict(row) for row in rows]


### Template 4: Weaviate Implementation

import weaviate
from weaviate.util import generate_uuid5
from typing import List, Dict, Optional

class WeaviateVectorStore:
def __init__(
self,
url: str = "http://localhost:8080",
class_name: str = "Document"
):
self.client = weaviate.Client(url=url)
self.class_name = class_name
self._ensure_schema()

def _ensure_schema(self):
"""Create schema if not exists."""
schema = {
"class": self.class_name,
"vectorizer": "none", # We provide vectors
"properties": [
{"name": "content", "dataType": ["text"]},
{"name": "source", "dataType": ["string"]},
{"name": "chunk_id", "dataType": ["int"]}
]
}

if not self.client.schema.exists(self.class_name):
self.client.schema.create_class(schema)

def upsert(self, documents: List[Dict]):
"""Batch upsert documents."""
with self.client.batch as batch:
batch.batch_size = 100

for doc in documents:
batch.add_data_object(
data_object={
"content": doc["content"],
"source": doc.get("source", ""),
"chunk_id": doc.get("chunk_id", 0)
},
class_name=self.class_name,
uuid=generate_uuid5(doc["id"]),
vector=doc["embedding"]
)

def search(
self,
query_vector: List[float],
limit: int = 10,
where_filter: Optional[Dict] = None
) -> List[Dict]:
"""Vector search."""
query = (
self.client.query
.get(self.class_name, ["content", "source", "chunk_id"])
.with_near_vector({"vector": query_vector})
.with_limit(limit)
.with_additional(["distance", "id"])
)

if where_filter:
query = query.with_where(where_filter)

results = query.do()

return [
{
"id": item["_additional"]["id"],
"content": item["content"],
"source": item["source"],
"score": 1 - item["_additional"]["distance"]
}
for item in results["data"]["Get"][self.class_name]
]

def hybrid_search(
self,
query: str,
query_vector: List[float],
limit: int = 10,
alpha: float = 0.5 # 0 = keyword, 1 = vector
) -> List[Dict]:
"""Hybrid search combining BM25 and vector."""
results = (
self.client.query
.get(self.class_name, ["content", "source"])
.with_hybrid(query=query, vector=query_vector, alpha=alpha)
.with_limit(limit)
.with_additional(["score"])
.do()
)

return [
{
"content": item["content"],
"source": item["source"],
"score": item["_additional"]["score"]
}
for item in results["data"]["Get"][self.class_name]
]


## Best Practices

### Do's

- **Use appropriate index** - HNSW for most cases
- **Tune parameters** - ef_search, nprobe for recall/speed
- **Implement hybrid search** - Combine with keyword search
- **Monitor recall** - Measure search quality
- **Pre-filter when possible** - Reduce search space

### Don'ts

- **Don't skip evaluation** - Measure before optimizing
- **Don't over-index** - Start with flat, scale up
- **Don't ignore latency** - P99 matters for UX
- **Don't forget costs** - Vector storage adds up

## Resources

- [Pinecone Docs](https://docs.pinecone.io/)
- [Qdrant Docs](https://qdrant.tech/documentation/)
- [pgvector](https://github.com/pgvector/pgvector)
- [Weaviate Docs](https://weaviate.io/developers/weaviate)

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/similarity-search-patterns/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/wshobson/agents/similarity-search-patterns/SKILL.md
  • Cursor: ~/.cursor/skills/wshobson/agents/similarity-search-patterns/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/wshobson/agents/similarity-search-patterns/SKILL.md

πŸš€ Install with CLI:
npx skills add wshobson/agents

Read the Master Guide: Mastering Agent Skills β†’

Related Skill Units

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid ai tools & agents issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under AI Tools & Agents and is published by W. Shobson, maintained in wshobson/agents.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.