workflow-orchestration-patterns
Install this skill
npx skills add wshobson/agentsWorks across Claude Code, Cursor, Codex, Copilot & Antigravity
Workflow orchestration patterns provide a structured framework for managing distributed system state, specifically addressing how to handle long-running operations and partial failures. By distinguishing between orchestration logic—which must remain deterministic—and external interactions, developers can build fault-tolerant systems that survive infrastructure restarts without losing progress. This approach utilizes event histories to track state transitions, enabling automated recovery and complex coordination. Rather than managing manual error handling or ad-hoc timeouts, these patterns impose a clear separation of concerns. Workflows act as the control plane for business logic, while activities serve as the execution layer for side-effect-heavy tasks like API calls or database updates. Mastering these patterns allows for the creation of reliable multi-step processes, such as distributed transactions and persistent entity management, which otherwise fail in traditional stateless server environments.
When to Use This Skill
- •Multi-step e-commerce order processing and inventory management
- •Long-running CI/CD pipeline automation
- •Account lifecycle and status tracking
- •Human-in-the-loop approval workflows with timeouts
How to Invoke This Skill
Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:
- “How do I implement a saga for distributed transactions?
- “Show me how to orchestrate parallel tasks in Temporal.
- “Help me refactor this long-running process into a workflow.
- “What are the determinism rules for writing workflow logic?
- “How do I handle human approval steps in an automated flow?
Pro Tips
- 💡Always clearly delineate between Workflow Code (deterministic, orchestrating logic) and Activity Code (non-deterministic, I/O operations). This is key for Temporal's reliability.
- 💡When designing sagas, map out all potential failure points and define explicit compensation activities for each step to ensure data consistency across services.
- 💡Leverage Temporal's Query and Signal APIs to interact with running workflows, enabling real-time status checks and external event injection without violating determinism.
What this skill does
- •Automatic state persistence across system failures
- •Implementation of distributed transactions using the Saga pattern
- •Concurrent task execution through fan-out/fan-in models
- •Asynchronous waiting for external human or system signals
- •Deterministic orchestration of complex business logic
When not to use it
- ✕Simple CRUD operations or basic REST API request/response cycles
- ✕High-throughput batch data processing pipelines
- ✕Real-time event streaming requiring sub-millisecond latency
Example workflow
- Initialize inventory reservation step
- Execute payment processing activity
- Register compensation logic for rollbacks
- Await signal from external fulfillment system
- Commit transaction or trigger compensation sequence
Prerequisites
- –A distributed task queue or orchestration engine like Temporal
- –Understanding of state machines and event sourcing
- –Familiarity with idempotent design principles
Pitfalls & limitations
- !Writing non-deterministic code like random numbers inside workflow logic
- !Overloading a single workflow instead of using child workflows
- !Forgetting to make external activities idempotent
- !Running heavy computation directly in the workflow rather than an activity
FAQ
How it compares
While manual scripts rely on volatile database flags and complex retry loops, workflow orchestration provides a native, runtime-level guarantee of state, automatically handling replays and failures.
📄 Full skill instructions — original source: wshobson/agents
Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.
## When to Use Workflow Orchestration
### Ideal Use Cases (Source: docs.temporal.io)
- **Multi-step processes** spanning machines/services/databases
- **Distributed transactions** requiring all-or-nothing semantics
- **Long-running workflows** (hours to years) with automatic state persistence
- **Failure recovery** that must resume from last successful step
- **Business processes**: bookings, orders, campaigns, approvals
- **Entity lifecycle management**: inventory tracking, account management, cart workflows
- **Infrastructure automation**: CI/CD pipelines, provisioning, deployments
- **Human-in-the-loop** systems requiring timeouts and escalations
### When NOT to Use
- Simple CRUD operations (use direct API calls)
- Pure data processing pipelines (use Airflow, batch processing)
- Stateless request/response (use standard APIs)
- Real-time streaming (use Kafka, event processors)
## Critical Design Decision: Workflows vs Activities
**The Fundamental Rule** (Source: temporal.io/blog/workflow-engine-principles):
- **Workflows** = Orchestration logic and decision-making
- **Activities** = External interactions (APIs, databases, network calls)
### Workflows (Orchestration)
**Characteristics:**
- Contain business logic and coordination
- **MUST be deterministic** (same inputs → same outputs)
- **Cannot** perform direct external calls
- State automatically preserved across failures
- Can run for years despite infrastructure failures
**Example workflow tasks:**
- Decide which steps to execute
- Handle compensation logic
- Manage timeouts and retries
- Coordinate child workflows
### Activities (External Interactions)
**Characteristics:**
- Handle all external system interactions
- Can be non-deterministic (API calls, DB writes)
- Include built-in timeouts and retry logic
- **Must be idempotent** (calling N times = calling once)
- Short-lived (seconds to minutes typically)
**Example activity tasks:**
- Call payment gateway API
- Write to database
- Send emails or notifications
- Query external services
### Design Decision Framework
Does it touch external systems? → Activity
Is it orchestration/decision logic? → Workflow## Core Workflow Patterns
### 1. Saga Pattern with Compensation
**Purpose**: Implement distributed transactions with rollback capability
**Pattern** (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):
For each step:
1. Register compensation BEFORE executing
2. Execute the step (via activity)
3. On failure, run all compensations in reverse order (LIFO)**Example: Payment Workflow**
1. Reserve inventory (compensation: release inventory)
2. Charge payment (compensation: refund payment)
3. Fulfill order (compensation: cancel fulfillment)
**Critical Requirements:**
- Compensations must be idempotent
- Register compensation BEFORE executing step
- Run compensations in reverse order
- Handle partial failures gracefully
### 2. Entity Workflows (Actor Model)
**Purpose**: Long-lived workflow representing single entity instance
**Pattern** (Source: docs.temporal.io/evaluate/use-cases-design-patterns):
- One workflow execution = one entity (cart, account, inventory item)
- Workflow persists for entity lifetime
- Receives signals for state changes
- Supports queries for current state
**Example Use Cases:**
- Shopping cart (add items, checkout, expiration)
- Bank account (deposits, withdrawals, balance checks)
- Product inventory (stock updates, reservations)
**Benefits:**
- Encapsulates entity behavior
- Guarantees consistency per entity
- Natural event sourcing
### 3. Fan-Out/Fan-In (Parallel Execution)
**Purpose**: Execute multiple tasks in parallel, aggregate results
**Pattern:**
- Spawn child workflows or parallel activities
- Wait for all to complete
- Aggregate results
- Handle partial failures
**Scaling Rule** (Source: temporal.io/blog/workflow-engine-principles):
- Don't scale individual workflows
- For 1M tasks: spawn 1K child workflows × 1K tasks each
- Keep each workflow bounded
### 4. Async Callback Pattern
**Purpose**: Wait for external event or human approval
**Pattern:**
- Workflow sends request and waits for signal
- External system processes asynchronously
- Sends signal to resume workflow
- Workflow continues with response
**Use Cases:**
- Human approval workflows
- Webhook callbacks
- Long-running external processes
## State Management and Determinism
### Automatic State Preservation
**How Temporal Works** (Source: docs.temporal.io/workflows):
- Complete program state preserved automatically
- Event History records every command and event
- Seamless recovery from crashes
- Applications restore pre-failure state
### Determinism Constraints
**Workflows Execute as State Machines**:
- Replay behavior must be consistent
- Same inputs → identical outputs every time
**Prohibited in Workflows** (Source: docs.temporal.io/workflows):
- ❌ Threading, locks, synchronization primitives
- ❌ Random number generation (
random())- ❌ Global state or static variables
- ❌ System time (
datetime.now())- ❌ Direct file I/O or network calls
- ❌ Non-deterministic libraries
**Allowed in Workflows**:
- ✅
workflow.now() (deterministic time)- ✅
workflow.random() (deterministic random)- ✅ Pure functions and calculations
- ✅ Calling activities (non-deterministic operations)
### Versioning Strategies
**Challenge**: Changing workflow code while old executions still running
**Solutions**:
1. **Versioning API**: Use
workflow.get_version() for safe changes2. **New Workflow Type**: Create new workflow, route new executions to it
3. **Backward Compatibility**: Ensure old events replay correctly
## Resilience and Error Handling
### Retry Policies
**Default Behavior**: Temporal retries activities forever
**Configure Retry**:
- Initial retry interval
- Backoff coefficient (exponential backoff)
- Maximum interval (cap retry delay)
- Maximum attempts (eventually fail)
**Non-Retryable Errors**:
- Invalid input (validation failures)
- Business rule violations
- Permanent failures (resource not found)
### Idempotency Requirements
**Why Critical** (Source: docs.temporal.io/activities):
- Activities may execute multiple times
- Network failures trigger retries
- Duplicate execution must be safe
**Implementation Strategies**:
- Idempotency keys (deduplication)
- Check-then-act with unique constraints
- Upsert operations instead of insert
- Track processed request IDs
### Activity Heartbeats
**Purpose**: Detect stalled long-running activities
**Pattern**:
- Activity sends periodic heartbeat
- Includes progress information
- Timeout if no heartbeat received
- Enables progress-based retry
## Best Practices
### Workflow Design
1. **Keep workflows focused** - Single responsibility per workflow
2. **Small workflows** - Use child workflows for scalability
3. **Clear boundaries** - Workflow orchestrates, activities execute
4. **Test locally** - Use time-skipping test environment
### Activity Design
1. **Idempotent operations** - Safe to retry
2. **Short-lived** - Seconds to minutes, not hours
3. **Timeout configuration** - Always set timeouts
4. **Heartbeat for long tasks** - Report progress
5. **Error handling** - Distinguish retryable vs non-retryable
### Common Pitfalls
**Workflow Violations**:
- Using
datetime.now() instead of workflow.now()- Threading or async operations in workflow code
- Calling external APIs directly from workflow
- Non-deterministic logic in workflows
**Activity Mistakes**:
- Non-idempotent operations (can't handle retries)
- Missing timeouts (activities run forever)
- No error classification (retry validation errors)
- Ignoring payload limits (2MB per argument)
### Operational Considerations
**Monitoring**:
- Workflow execution duration
- Activity failure rates
- Retry attempts and backoff
- Pending workflow counts
**Scalability**:
- Horizontal scaling with workers
- Task queue partitioning
- Child workflow decomposition
- Activity batching when appropriate
## Additional Resources
**Official Documentation**:
- Temporal Core Concepts: docs.temporal.io/workflows
- Workflow Patterns: docs.temporal.io/evaluate/use-cases-design-patterns
- Best Practices: docs.temporal.io/develop/best-practices
- Saga Pattern: temporal.io/blog/saga-pattern-made-easy
**Key Principles**:
1. Workflows = orchestration, Activities = external calls
2. Determinism is non-negotiable for workflows
3. Idempotency is critical for activities
4. State preservation is automatic
5. Design for failure and recovery
How to Use This Skill Unit
Option A: Project-Specific (Recommended)
- Click "Download" above
- In your project, create the directory:
.agent/skills/workflow-orchestration-patterns/ - Save the file as
SKILL.md - The agent will automatically discover the skill based on its description.
Option B: Global Installation (All Agents)
Save the file to these locations to make it available across all projects:
- Claude Code:
~/.claude/skills/wshobson/agents/workflow-orchestration-patterns/SKILL.md - Cursor:
~/.cursor/skills/wshobson/agents/workflow-orchestration-patterns/SKILL.md - Antigravity:
~/.gemini/antigravity/skills/wshobson/agents/workflow-orchestration-patterns/SKILL.md
🚀 Install with CLI:npx skills add wshobson/agents
