distributed-tracing

Name: distributed-tracing
Author: W. Shobson

distributed tracingJaegerTempomicroservicesobservabilitydebuggingperformance monitoringcloud-native

⭐ 36.8k📄 MIT🕒 2026-06-16Source ↗

Install this skill

npx skills add wshobson/agents

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

Distributed tracing provides end-to-end observability into the request lifecycle across complex microservice architectures. By implementing instrumentation using OpenTelemetry standards, this skill allows for the visualization of every transaction as it hops through services, gateways, and databases. It generates traces comprised of discrete spans that capture metadata, execution duration, and potential error states. This methodology eliminates guesswork when diagnosing latency spikes or mysterious failures, as it renders the entire execution path into a coherent sequence. Through the integration of collectors like Jaeger or Tempo, operators can map service dependencies and identify performance bottlenecks that are invisible to local logging or metrics alone. This tool provides the structural clarity required to debug multi-service environments effectively.

When to Use This Skill

•Isolating services responsible for slow API response times
•Debugging error propagation during cascading service failures
•Mapping complex inter-service communication patterns
•Verifying that database queries align with expected request flows

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

“trace the request path for this endpoint
“find why this microservice is timing out
“show me the dependency graph for our services
“instrument this flask app for distributed tracing
“how is this request failing across services

Pro Tips

💡Ensure consistent context propagation (e.g., HTTP headers) across ALL services, even non-instrumented ones, to maintain trace integrity.
💡Utilize meaningful tags and structured logs within spans to enrich trace data, making filtering and debugging more effective.
💡Regularly review your distributed traces to identify common patterns, recurring bottlenecks, or unexpected service dependencies for proactive optimization.

What this skill does

•Visualize request paths across multiple microservices
•Measure duration of individual operations within a request
•Propagate trace context metadata through service calls
•Identify specific nodes causing latency or errors
•Map service dependency topology

When not to use it

✕Monolithic applications with simple execution paths
✕Low-traffic services where overhead outweighs diagnostic value
✕Environments where security policies prohibit external metadata propagation

Example workflow

Deploy a Jaeger collector into the observability namespace
Configure OpenTelemetry providers in the target microservice
Instrument specific methods and network calls with spans
Generate traffic to trigger service-to-service communication
Access the Jaeger dashboard to review the generated traces

Prerequisites

–A microservices architecture
–OpenTelemetry SDKs for your language
–A backend collector such as Jaeger or Tempo

Pitfalls & limitations

!High sampling rates can consume significant storage and network bandwidth
!Incomplete instrumentation in one service will result in fragmented trace views
!Adding too much metadata to spans may degrade service performance

FAQ

What is the difference between a trace and a span?

A trace represents the entire end-to-end request journey, while a span represents a single operation or unit of work within that journey.

Do I need to change my application code to use tracing?

Yes, you must instrument your application code to initialize the tracer and create spans around business logic or network calls.

Can I use distributed tracing for database queries?

Yes, by wrapping database calls in spans, you can track query latency and identify inefficient database operations within the request path.

How it compares

Unlike manual logging which provides disjointed events, distributed tracing creates a unified, queryable timeline that links related actions across different servers and processes.

Source & trust

⭐ 37k stars📄 MIT🕒 Updated 2026-06-16

View original skill on GitHub →

📄 Full skill instructions — original source: wshobson/agents

# Distributed Tracing

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

## Purpose

Track requests across distributed systems to understand latency, dependencies, and failure points.

## When to Use

- Debug latency issues
- Understand service dependencies
- Identify bottlenecks
- Trace error propagation
- Analyze request paths

## Distributed Tracing Concepts

### Trace Structure

Trace (Request ID: abc123)
  ↓
Span (frontend) [100ms]
  ↓
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]

### Key Components

- **Trace** - End-to-end request journey
- **Span** - Single operation within a trace
- **Context** - Metadata propagated between services
- **Tags** - Key-value pairs for filtering
- **Logs** - Timestamped events within a span

## Jaeger Setup

### Kubernetes Deployment

# Deploy Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability

# Deploy Jaeger instance
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: observability
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
  ingress:
    enabled: true
EOF

### Docker Compose

version: "3.8"
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
      - "16686:16686" # UI
      - "14268:14268" # Collector
      - "14250:14250" # gRPC
      - "9411:9411" # Zipkin
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411

**Reference:** See references/jaeger-setup.md

## Application Instrumentation

### OpenTelemetry (Recommended)

#### Python (Flask)

from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from flask import Flask

# Initialize tracer
resource = Resource(attributes={SERVICE_NAME: "my-service"})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(JaegerExporter(
    agent_host_name="jaeger",
    agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Instrument Flask
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)

@app.route('/api/users')
def get_users():
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("get_users") as span:
        span.set_attribute("user.count", 100)
        # Business logic
        users = fetch_users_from_db()
        return {"users": users}

def fetch_users_from_db():
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("database_query") as span:
        span.set_attribute("db.system", "postgresql")
        span.set_attribute("db.statement", "SELECT * FROM users")
        # Database query
        return query_database()

#### Node.js (Express)

const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");
const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
const { registerInstrumentations } = require("@opentelemetry/instrumentation");
const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
const {
  ExpressInstrumentation,
} = require("@opentelemetry/instrumentation-express");

// Initialize tracer
const provider = new NodeTracerProvider({
  resource: { attributes: { "service.name": "my-service" } },
});

const exporter = new JaegerExporter({
  endpoint: "http://jaeger:14268/api/traces",
});

provider.addSpanProcessor(new BatchSpanProcessor(exporter));
provider.register();

// Instrument libraries
registerInstrumentations({
  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
});

const express = require("express");
const app = express();

app.get("/api/users", async (req, res) => {
  const tracer = trace.getTracer("my-service");
  const span = tracer.startSpan("get_users");

  try {
    const users = await fetchUsers();
    span.setAttributes({ "user.count": users.length });
    res.json({ users });
  } finally {
    span.end();
  }
});

#### Go

package main

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/jaeger"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)

func initTracer() (*sdktrace.TracerProvider, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("my-service"),
        )),
    )

    otel.SetTracerProvider(tp)
    return tp, nil
}

func getUsers(ctx context.Context) ([]User, error) {
    tracer := otel.Tracer("my-service")
    ctx, span := tracer.Start(ctx, "get_users")
    defer span.End()

    span.SetAttributes(attribute.String("user.filter", "active"))

    users, err := fetchUsersFromDB(ctx)
    if err != nil {
        span.RecordError(err)
        return nil, err
    }

    span.SetAttributes(attribute.Int("user.count", len(users)))
    return users, nil
}

**Reference:** See references/instrumentation.md

## Context Propagation

### HTTP Headers

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE

### Propagation in HTTP Requests

#### Python

from opentelemetry.propagate import inject

headers = {}
inject(headers)  # Injects trace context

response = requests.get('http://downstream-service/api', headers=headers)

#### Node.js

const { propagation } = require("@opentelemetry/api");

const headers = {};
propagation.inject(context.active(), headers);

axios.get("http://downstream-service/api", { headers });

## Tempo Setup (Grafana)

### Kubernetes Deployment

apiVersion: v1
kind: ConfigMap
metadata:
  name: tempo-config
data:
  tempo.yaml: |
    server:
      http_listen_port: 3200

    distributor:
      receivers:
        jaeger:
          protocols:
            thrift_http:
            grpc:
        otlp:
          protocols:
            http:
            grpc:

    storage:
      trace:
        backend: s3
        s3:
          bucket: tempo-traces
          endpoint: s3.amazonaws.com

    querier:
      frontend_worker:
        frontend_address: tempo-query-frontend:9095
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tempo
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: tempo
          image: grafana/tempo:latest
          args:
            - -config.file=/etc/tempo/tempo.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/tempo
      volumes:
        - name: config
          configMap:
            name: tempo-config

**Reference:** See assets/jaeger-config.yaml.template

## Sampling Strategies

### Probabilistic Sampling

# Sample 1% of traces
sampler:
  type: probabilistic
  param: 0.01

### Rate Limiting Sampling

# Sample max 100 traces per second
sampler:
  type: ratelimiting
  param: 100

### Adaptive Sampling

from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased

# Sample based on trace ID (deterministic)
sampler = ParentBased(root=TraceIdRatioBased(0.01))

## Trace Analysis

### Finding Slow Requests

**Jaeger Query:**

service=my-service
duration > 1s

### Finding Errors

**Jaeger Query:**

service=my-service
error=true
tags.http.status_code >= 500

### Service Dependency Graph

Jaeger automatically generates service dependency graphs showing:

- Service relationships
- Request rates
- Error rates
- Average latencies

## Best Practices

1. **Sample appropriately** (1-10% in production)
2. **Add meaningful tags** (user_id, request_id)
3. **Propagate context** across all service boundaries
4. **Log exceptions** in spans
5. **Use consistent naming** for operations
6. **Monitor tracing overhead** (<1% CPU impact)
7. **Set up alerts** for trace errors
8. **Implement distributed context** (baggage)
9. **Use span events** for important milestones
10. **Document instrumentation** standards

## Integration with Logging

### Correlated Logs

import logging
from opentelemetry import trace

logger = logging.getLogger(__name__)

def process_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id

    logger.info(
        "Processing request",
        extra={"trace_id": format(trace_id, '032x')}
    )

## Troubleshooting

**No traces appearing:**

- Check collector endpoint
- Verify network connectivity
- Check sampling configuration
- Review application logs

**High latency overhead:**

- Reduce sampling rate
- Use batch span processor
- Check exporter configuration

## Reference Files

- references/jaeger-setup.md - Jaeger installation
- references/instrumentation.md - Instrumentation patterns
- assets/jaeger-config.yaml.template - Jaeger configuration

## Related Skills

- prometheus-configuration - For metrics
- grafana-dashboards - For visualization
- slo-implementation - For latency SLOs

By W. Shobson

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Click "Download" above
In your project, create the directory: .agent/skills/distributed-tracing/
Save the file as SKILL.md
The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

Claude Code: ~/.claude/skills/wshobson/agents/distributed-tracing/SKILL.md
Cursor: ~/.cursor/skills/wshobson/agents/distributed-tracing/SKILL.md
Antigravity: ~/.gemini/antigravity/skills/wshobson/agents/distributed-tracing/SKILL.md

🚀 Install with CLI:
npx skills add wshobson/agents

Read the Master Guide: Mastering Agent Skills →

Recommended Rules

View more rules →

Recommended Workflows

View more workflows →

Trace Requests with OpenTelemetry

ObservabilityTracingDebugging

--- description: Setup request tracing --- 1. **Install OpenTelemetry**: // turbo - Run `npm install @opentelemetry/api` 2. **Add Trace IDs**:...

Debugging Infinite Re-renders

ReactDebuggingPerformance

--- description: Track down and fix infinite loops in useEffect and component rendering --- 1. **Check `useEffect` Dependencies**: - The most comm...

Fix Next.js Hydration Errors

Next.jsDebuggingHydration

--- description: Systematically debug and fix 'Text content does not match server-rendered HTML' errors --- 1. **Check for Invalid HTML Nesting**: ...

Recommended MCP Servers

View more MCP servers →

AgentOps

Official

Provide observability and tracing for debugging AI agents with [AgentOps](https://www.agentops.ai/) API.

VictoriaMetrics

Official

Comprehensive integration with [VictoriaMetrics APIs](https://docs.victoriametrics.com/victoriametrics/url-examples/) and [documentation](https://docs.victoriametrics.com/) for monitoring, observability, and debugging tasks related to your VictoriaMetrics instances.

VictoriaTraces

Official

Integration with [VictoriaTraces APIs](https://docs.victoriametrics.com/victoriatraces/querying/#http-api) and [documentation](https://docs.victoriametrics.com/victoriatraces/) for working with distributed tracing and debugging tasks related to your VictoriaTraces instances.

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library

📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

distributed-tracing

Install this skill

When to Use This Skill

How to Invoke This Skill

Pro Tips

What this skill does

When not to use it

Example workflow

Prerequisites

Pitfalls & limitations

FAQ

How it compares

Source & trust

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Option B: Global Installation (All Agents)

Recommended Rules

🐛 Debugging Agent - Systematic Bug Hunter

Go Microservices Architecture

Monitoring & Observability (Prometheus, Grafana)

Recommended Workflows

Trace Requests with OpenTelemetry

Debugging Infinite Re-renders

Fix Next.js Hydration Errors

Recommended MCP Servers

AgentOps

VictoriaMetrics

VictoriaTraces

Take It Further

Define Your Standards

Master Workflows

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

For Cursor & Windsurf

Source & attribution