name: long-running-process-orchestration description: "Use when designing or reviewing multi-step Apex orchestration that spans multiple transactions, checkpoints state across async boundaries, or recovers from partial failure in a long-running workflow. Trigger keywords: 'multi-step Queueable chain', 'cross-transaction state', 'platform event state machine', 'Finalizer retry', 'orchestrate across transactions', 'workflow progress tracking'. NOT for single-job Queueable design (use apex-queueable-patterns), NOT for Flow Orchestration (use flow/orchestration-flows), NOT for bulk record processing where Batch Apex is the primary tool (use batch-apex-patterns)." category: apex salesforce-version: "Spring '25+" well-architected-pillars:
- Reliability
- Scalability
- Operational Excellence tags:
- queueable
- finalizer
- platform-events
- async-apex
- orchestration
- state-machine
- cross-transaction inputs:
- "Description of the multi-step process and its steps, dependencies, and failure modes"
- "Whether steps require callouts, DML, or external triggers to advance"
- "Operational requirements: retry behavior, progress visibility, partial failure handling"
- "Expected data volume per step (determines whether chaining or Batch is more appropriate)" outputs:
- "Orchestration pattern recommendation (Queueable chain vs Platform Event state machine vs hybrid)"
- "Apex class scaffold with state-passing, Finalizer, and progress tracking"
- "Review findings for existing multi-step async implementations"
- "Decision guidance on checkpointing, retry depth, and step boundaries" triggers:
- how do I chain Queueable jobs across multiple transactions
- Apex process needs to run across multiple transactions with state checkpointing
- platform event state machine for long-running workflow
- Finalizer interface retry pattern for Queueable failure recovery
- track progress of a multi-step async Apex orchestration dependencies:
- apex/apex-queueable-patterns
- apex/platform-events-apex version: 1.0.0 author: Pranav Nagrecha updated: 2026-04-06
Long-Running Process Orchestration
Use this skill when a business process must span multiple Apex transactions, cannot complete within a single execution context, and requires state preservation, progress tracking, and recovery from partial failure. The skill covers Queueable chaining with bounded depth, Platform Event state machines, Finalizer-based recovery, and Continuation for async UI callouts.
Before Starting
Gather this context before working on anything in this domain:
- How many discrete steps does the process have, and are they sequential, parallel, or conditional?
- Can each step complete within a single Apex transaction (60 000 ms CPU, 100 SOQL, 150 DML)? If any step may exceed those limits, it needs its own Queueable or Batch boundary.
- Which steps involve callouts? Callout steps require
Database.AllowsCalloutson their Queueable class. - Where does progress need to be visible — operations dashboard, record field, email notification? This shapes the checkpoint mechanism.
- What is the correct recovery behavior when step N fails? Should the whole process abort, retry step N, or skip to a compensating action?
- Is the process already partially implemented in Flow Orchestration? If so, the Apex role may be limited to a single-step callout or heavy data operation delegated from Flow.
Core Concepts
Why Single Transactions Are Insufficient
Salesforce imposes hard transaction limits per execution context: 60 000 ms CPU, 100 SOQL queries, 150 DML statements, 12 MB heap, and a 10-second total callout timeout. A process that must send 20 batched callouts, process thousands of records in sequential dependency steps, or wait for external system responses cannot fit in one transaction. Multi-step orchestration is the platform-endorsed pattern for these cases: each step runs in its own transaction with a full allocation of governor limits, and state advances through controlled async handoffs.
Queueable Chaining As A Step Sequencer
The most direct Apex orchestration mechanism is chaining Queueable jobs. Each Queueable's execute() method does its work and enqueues the next job in the sequence, passing forward a state object through the constructor. Key constraints that shape this pattern:
- Only one child Queueable may be enqueued per
execute()call (Apex Developer Guide — Queueable Apex). AsyncOptions.MaximumQueueableStackDepthcaps the chain length. The platform enforces a maximum of 5 levels in async contexts; in synchronous test contexts the default is 1. EverySystem.enqueueJob()call in the chain must re-passAsyncOptions— it does not propagate automatically.System.AsyncInfo.getCurrentQueueableStackDepth()returns the current chain depth so logic can decide whether to continue or hand off.- Fan-out (one step spawning multiple parallel children) is not supported through chaining alone. Use Platform Events or Batch Apex for fan-out.
The Finalizer Interface For Cross-Step Recovery
The System.Finalizer interface runs in a guaranteed separate transaction after its parent Queueable completes — regardless of whether the parent succeeded or threw an uncaught exception. This is the only safe mechanism for:
- Advancing the orchestration state on failure (compensating transaction)
- Retrying a failed step up to a bounded count
- Recording a dead-letter entry when retries are exhausted
- Triggering downstream notification or monitoring
The FinalizerContext provides getResult() (SUCCESS or UNHANDLED_EXCEPTION), getException(), and getJobId(). A Finalizer may enqueue one additional Queueable. System.attachFinalizer() must be called early in the parent's execute() before any code that could throw (Apex Developer Guide — Transaction Finalizers).
Platform Event State Machine
When a process needs to cross system boundaries, tolerate external latency, or decouple step producers from step consumers, a Platform Event state machine is more flexible than direct Queueable chaining. The pattern:
- Each step completion publishes a
StepCompleted__e(or domain-specific) Platform Event with the next-step identifier and carry-forward state as payload fields. - An Apex
after inserttrigger on that event type reads the next-step field and dispatches a Queueable for the next step. - Progress and replay position are tracked on a Custom Object record (the "process instance"), which also serves as the operational dashboard.
Platform Events decouple publish time from subscribe time, survive transaction rollbacks (events published in a rolled-back DML transaction are still delivered — this is a feature, not a bug), and allow external systems to participate as steps in the process (Platform Events Developer Guide).
Progress Tracking And Observability
A long-running process is operationally incomplete without a way to observe where each instance is and why it stopped. Checkpoint patterns:
- Custom Object record per process instance:
Status__c,CurrentStep__c,StepStartedAt__c,ErrorMessage__c,RetryCount__c. Updated at the start and end of each step. - Platform Event broadcast for real-time dashboards or monitoring triggers.
- AsyncApexJob records are queryable by job ID for built-in Queueable status.
Step start/end timestamps enable latency monitoring per step, which surfaces performance regressions before they cause timeouts.
Continuation For Long-Running LWC Callouts
When the long-running operation is a callout initiated from Lightning UI, the Continuation class provides an async callout mechanism that does not block the user session. The controller initiates the callout by returning a Continuation object; Salesforce processes the callout asynchronously (up to 120 seconds) and invokes a callback method with the response. This is distinct from Queueable: Continuation is UI-initiated, synchronous from the user's perspective, and limited to callout contexts in LWC/Aura/Visualforce controllers (Apex Developer Guide — Continuation).
Common Patterns
Bounded Queueable Chain With State Object
When to use: A sequential process of 3–10 steps where each step is a distinct DML or callout operation, steps are ordered and have no parallelism, and the total chain depth fits within platform limits.
How it works:
- Define a serializable
ProcessStateinner class or standalone class carrying:currentStep(integer or enum),recordId,retryCount, and any payload fields needed by subsequent steps. - Each Queueable receives a
ProcessStateinstance, executes its step, updates the Custom Object checkpoint record, then evaluates whether to chain:
public class StepOneQueueable implements Queueable {
private final ProcessState state;
public StepOneQueueable(ProcessState state) {
this.state = state;
}
public void execute(QueueableContext ctx) {
System.attachFinalizer(new StepFinalizer(state));
// Do step work
doStepOneWork(state.recordId);
// Update checkpoint
updateCheckpoint(state.recordId, 'StepOne', 'Complete');
// Chain to next step
state.currentStep = 2;
AsyncOptions opts = new AsyncOptions();
opts.MaximumQueueableStackDepth = 5;
System.enqueueJob(new StepTwoQueueable(state), opts);
}
}
- The Finalizer handles failure — it increments
retryCounton the state object and re-enqueues the same step if retries remain, or marks the process instance as failed.
Why not the alternative: Placing all steps in a single Queueable with a switch statement runs the risk of hitting governor limits mid-step with no safe boundary. Separate Queueables give each step its own full transaction allocation.
Platform Event State Machine With Process Instance Record
When to use: The process involves external system responses between steps, steps may run minutes or hours apart, or you need strict decoupling between step producers and consumers.
How it works:
- Create a Process Instance Custom Object (
OrchestrationRun__c) with fields:Status__c(picklist: Pending/Running/Complete/Failed),CurrentStep__c(text),RetryCount__c,LastError__c. - When launching the process, insert the record with
Status__c = 'Running'andCurrentStep__c = 'StepOne', then publish aStepAdvance__eplatform event withRunId__c = record.IdandStep__c = 'StepOne'. - The platform event trigger reads
Step__cand enqueues the appropriate Queueable:
trigger StepAdvanceTrigger on StepAdvance__e (after insert) {
for (StepAdvance__e evt : Trigger.new) {
if (evt.Step__c == 'StepOne') {
System.enqueueJob(new StepOneWorker(evt.RunId__c));
} else if (evt.Step__c == 'StepTwo') {
System.enqueueJob(new StepTwoWorker(evt.RunId__c));
}
// Additional steps...
}
}
- At the end of each worker, it updates
OrchestrationRun__cand publishes the nextStepAdvance__e. - On failure, the Finalizer updates
Status__c = 'Failed'andLastError__c.
Why not the alternative: Queueable chaining alone cannot resume after an external system delay of hours without consuming a Queueable slot waiting for the response. Platform Events decouple the step trigger from the response and survive org restarts.
Hybrid: Queueable Chain Within Steps, Platform Events Between Phases
When to use: The orchestration has distinct phases (e.g., Validation, Processing, Notification) where each phase consists of sequential sub-steps, and phases are separated by external events or significant time gaps.
How it works: Use a Queueable chain for the sub-steps within a phase; use a Platform Event to signal phase completion and trigger the next phase. The Custom Object record tracks both phase and sub-step. This hybrid keeps chain depth low per phase while maintaining decoupled phase transitions.
Decision Guidance
| Situation | Recommended Approach | Reason |
|---|---|---|
| 3–7 sequential steps, no external waits, completes in minutes | Bounded Queueable chain | Simpler, no event infrastructure needed |
| Steps separated by external system responses or human actions | Platform Event state machine | Decouples step timing from Apex transaction lifecycle |
| Fan-out required (one step spawns N parallel workers) | Platform Events or Batch | Queueable allows only one child per execution |
| Step failure should auto-retry up to N times | Finalizer-based retry | Only mechanism that handles all failure modes including system-level exceptions |
| Process is UI-initiated and involves a long callout | Continuation | Designed for async LWC/Aura callouts up to 120s |
| Step volume exceeds Queueable chain depth limits | Batch Apex for heavy step, Queueable to orchestrate transitions | Batch handles large data volume; Queueable manages step transitions |
| Operations team needs live process visibility | Custom Object checkpoint + Platform Event broadcast | Enables query-based dashboards and external monitoring |
| Process has 10+ steps or irregular branching | Platform Event state machine with dispatch table | Cleaner than deeply nested Queueable chain logic |
Recommended Workflow
Step-by-step instructions for an AI agent or practitioner activating this skill:
- Gather context — identify all steps, their sequencing rules, governor-limit risks per step, callout needs, and failure recovery expectations. Confirm whether the process is purely Apex or has external participants.
- Choose the primary pattern — use the Decision Guidance table above. For simple sequential processes, start with Queueable chaining; for externally-triggered or time-separated steps, use the Platform Event state machine.
- Design the state model — define the Custom Object (or state class) that will carry process instance data across transactions. Include step identifier, retry count, timestamps, and error message fields.
- Implement step boundaries — each step should be a separate Queueable class. Attach a Finalizer in every Queueable that has side effects. Pass state only through constructor fields — not static variables.
- Implement the checkpoint mechanism — update the Custom Object record at the start and end of each step. Publish a progress Platform Event if real-time dashboards are required.
- Validate — run the checker script, verify governor limit exposure per step in test scenarios, confirm Finalizer coverage, and ensure chain depth respects
MaximumQueueableStackDepth. - Review against checklist — confirm all items in the Review Checklist are satisfied before marking implementation complete.
Review Checklist
- Each step is implemented as a separate Queueable with its own transaction boundary.
-
System.attachFinalizer()is called at the start of every Queueableexecute()method with side effects. -
AsyncOptions.MaximumQueueableStackDepthis set on everySystem.enqueueJob()call in a chain. -
System.AsyncInfo.getCurrentQueueableStackDepth()is used to guard against exceeding the cap. - State is passed through serializable constructor fields — no static variables cross transaction boundaries.
- A Custom Object or equivalent checkpoint record tracks the current step, retry count, and error state.
- Finalizer retry logic increments a counter and aborts after a configurable maximum.
- Platform Event triggers are thin and delegate heavy work to Queueable workers.
- Callout Queueables implement
Database.AllowsCallouts. - Test classes use
Test.startTest()/Test.stopTest()boundaries and verify the Custom Object checkpoint state.
Salesforce-Specific Gotchas
MaximumQueueableStackDepthdoes not propagate — setting the option on the firstenqueueJobcall does not carry forward to child jobs. Every subsequentenqueueJobin the chain must re-set it explicitly, or the depth guard is silently absent on all jobs beyond the first.- Platform Events published in a rolled-back DML transaction are still delivered — this is by design. A Queueable that fails mid-transaction but already published a step-advance event will advance the state machine while leaving the DML changes rolled back, causing state inconsistency. Publish step-advance events only after confirming the DML in that step succeeded.
- Finalizer failures are silent unless monitored — the Finalizer runs in a separate transaction with its own governor limits. If the Finalizer itself hits a limit or throws, it fails without surfacing to the original job's error context. Log from the Finalizer to a Custom Object or named credential endpoint so these failures are observable.
System.AsyncInfo.getCurrentQueueableStackDepth()returns 0 in synchronous test context — test methods that call Queueables synchronously viaTest.stopTest()will always see a depth of 0, so depth-guard branches that skip enqueuing will never be exercised in unit tests. Add an explicit integration test or set a lowMaximumQueueableStackDepthin test context to cover boundary logic.- Continuation is LWC/Aura/VF only — attempting to create a
Continuationfrom a trigger, Queueable, Batch class, or@AuraEnabledmethod that is not a controller action throws a runtime exception. The async callout pattern for non-UI contexts is always Queueable withDatabase.AllowsCallouts.
Output Artifacts
| Artifact | Description |
|---|---|
| Orchestration pattern recommendation | Decision on Queueable chain vs Platform Event state machine vs hybrid, with rationale |
| State model design | Custom Object schema and state class fields required for this process |
| Step scaffold | Queueable class skeletons with Finalizer, state passing, and checkpoint update |
| Platform Event trigger dispatch table | Thin trigger routing step-advance events to the correct Queueable worker |
| Review findings | Assessment of an existing multi-step async implementation against the review checklist |
Related Skills
apex/apex-queueable-patterns— use for single-job Queueable design, Finalizer API details, andAsyncOptionsreference.apex/platform-events-apex— use for Platform Event schema design, publish result handling, and CDC vs Platform Event selection.apex/batch-apex-patterns— use when individual steps involve large-volume record processing that exceeds Queueable limits.apex/async-apex— use when the question is whether Queueable, Batch, Future, or Scheduled is the right async mechanism.apex/exception-handling— use when the broader error handling framework and logging strategy are the primary concern.apex/governor-limits— use when steps are hitting specific limits that require redesign.flow/orchestration-flows— use when the orchestration can be entirely managed in Flow Orchestration without Apex.