Managing Complexity in Dynamic Workflow Planning
Complexity grounded in simplicity, for humans, LLMs, agents, and observability systems — and how structural invariants reduce runtime surprises
The Distinction: Complexity vs. Complication
Dynamic workflow planning is inherently complex. A system where an LLM generates workflow topology at runtime, where handlers can be composed dynamically from grammar primitives, where agents create investigation task chains, where planning can recur across multiple phases — this introduces combinatorial possibilities that static workflows do not have.
Complexity is not the concern. Complication is.
Complexity arises from the genuine richness of a problem space. A workflow that adapts to its inputs, that routes through different processing paths based on data characteristics, that plans in phases informed by intermediate results — this is complex because the problem is complex. The system’s behavior reflects the problem’s structure.
Complication arises from incidental difficulty — difficulty that exists because of how the system was built rather than because of what it does. Configuration that requires understanding internal implementation details. Failure modes that are opaque. Observability that buries the signal in noise. Abstractions that leak. Mental models that don’t match system behavior.
The goal of this document is to ensure that dynamic workflow planning introduces complexity proportional to the problems it solves while ruthlessly eliminating complication. The same step — with its state machine, its idempotency, its lifecycle — remains the atomic unit. The same execution guarantees hold. The same observability contracts apply. What’s new is the topology, not the mechanics.
Four Audiences, Four Models
Dynamic workflow planning has four distinct audiences, each with different mental models and different needs. Complexity management must serve all of them.
1. Human Operators
Operators need to understand what the system is doing, why it’s doing it, and what to do when something goes wrong. For dynamic workflows, this means:
- What was planned and why. Every planning step’s reasoning, the fragment it generated, and the validation result must be inspectable.
- What is executing now. The current state of all active steps, including dynamically created ones (whether referencing common patterns or dynamically composed), must be visible through the same interfaces used for static workflows.
- What went wrong. Failures in dynamic workflows must be diagnosable with the same tools used for static workflows, with additional context about the planning decisions that led to the failed step.
- What is the agent doing. For agent-created task chains, the
parent_correlation_idlineage must be traceable, showing the progression from investigation through design to execution.
The complication trap for operators: If dynamically planned workflows appear fundamentally different from static workflows in the observability UI — if they require a different mental model, different tooling, different diagnostic procedures — then we have introduced complication. The operator should see steps, dependencies, states, and results. The fact that some steps were created by a planner rather than a template, or composed dynamically rather than referencing a common pattern, should be visible but not disruptive.
2. LLMs as Planners
The LLM planner needs to understand what capabilities are available, what the problem context is, and what constraints apply. For effective planning, this means:
- Clear capability descriptions. The action grammar primitives, common patterns, and composition rules must be described in terms the LLM can reason about — not implementation details, but semantic capabilities, input/output contracts, and composition patterns. Because capability schemas are derived from grammar composition types (not hand-authored), they are always accurate.
- Bounded context. The information provided to the planner must be sufficient for good decisions but not so voluminous that it degrades reasoning quality. This is a context window management problem with direct impact on planning quality.
- Structured feedback. When a plan is invalid, the validation diagnostics must be actionable by the LLM in a retry attempt. “Handler ‘foo’ not found” is useful. “Validation failed” is not.
The complication trap for LLMs: If the capability schema is too granular (every parameter of every handler), the LLM drowns in detail. If it’s too abstract (“handlers exist”), the LLM can’t plan concretely. If the planning prompt requires understanding of Tasker internals (queue namespaces, transaction boundaries, PGMQ message formats), we’ve leaked implementation into the planning layer. The LLM should plan in terms of what needs to happen, not how Tasker works.
3. Agents as Clients
Agents need to understand what Tasker can do for them, how to structure their investigation and workflow creation, and how to monitor progress and handle failures. For effective agent integration, this means:
- Discoverable capabilities. The MCP server exposes what templates exist, what action grammar primitives are available, what schemas look like, and what resource limits apply. The agent should be able to explore the system’s capabilities without prior knowledge.
- Structured results. Task completion delivers results in a form the agent can reason about — typed data contracts mean the agent knows the shape of what it’s getting back.
- Clear failure modes. When a task or step fails, the agent receives structured diagnostic information sufficient to decide on recovery (retry, revise, escalate).
- Lineage tracking. The agent’s chain of tasks (linked by
parent_correlation_id) is traceable, enabling the agent — and human operators overseeing the agent — to understand the full investigation arc.
The complication trap for agents: If the agent needs to understand Tasker’s internal architecture to use it effectively, the abstraction has leaked. The agent should think in terms of “I need to investigate these three things in parallel and then combine the results” — not in terms of PGMQ queues, step state machines, or worker namespaces. The MCP server and task API should present the right level of abstraction.
4. Observability Systems
Observability infrastructure needs to ingest, correlate, and present the telemetry from dynamic workflows without special-casing. For coherent observability, this means:
- Consistent telemetry shape. Dynamically created steps emit the same metrics, logs, and traces as statically defined steps. Virtual handler steps emit the same telemetry as catalog handler steps. No parallel telemetry pipeline for “dynamic” or “virtual” steps.
- Planning provenance. Additional metadata connects each step to the planning decision that created it, enabling drill-down from “what happened” to “why this was planned.”
- Virtual handler provenance. Steps executed by virtual handlers include the composition specification in their metadata, enabling drill-down from “what failed” to “what composition was attempted.”
- Task lineage. For agent-created task chains,
parent_correlation_idenables correlation across tasks, showing the full arc from investigation to execution. - Aggregation across dynamic topologies. When a task template produces workflows with different shapes (because the planner chose different paths), observability must support comparison and aggregation across these variations.
The complication trap for observability: If dynamic workflows generate telemetry that existing dashboards and alerts can’t consume, operators fall back to log grep. If planning provenance is stored in a different system than step telemetry, correlation requires manual effort. If each unique workflow topology gets its own metric namespace, aggregation becomes impossible.
Design Principles for Complexity Management
Principle 1: The Step Remains the Atom
Every capability in dynamic workflow planning is expressed through steps. A planning step is a step. A grammar-composed handler step is a step. A convergence step is a step. Each has the same lifecycle, the same state machine, the same observability contract.
The action grammar layer adds compositional depth within a step — a handler may be composed from Acquire → Transform → Validate primitives — but from the orchestration layer’s perspective, it is still a single step with a single lifecycle. Whether the handler referenced a common pattern or was composed dynamically is an implementation detail of the handler, not a new structural concept in the workflow.
Implication: No new top-level concepts. No “planning phase” object separate from steps. No “fragment execution” lifecycle separate from step execution. No “grammar composition” lifecycle visible to the orchestrator. No “agent task” type separate from regular tasks. The DAG is the DAG, whether its topology was determined by a template, a planner, or an agent.
What this means practically:
- The workflow visualization shows steps and edges, regardless of how they were created
- Step-level alerts (timeout, failure, retry) work identically for all grammar-composed steps
- The DLQ system processes planned steps the same way as static steps
- Performance metrics (step latency, throughput) aggregate across planned and static steps regardless of composition origin
Principle 2: Provenance is Metadata, Not Structure
The fact that a step was created by a planning step rather than a template, or composed dynamically rather than referencing a common pattern, is important context. But it should be captured as metadata on the step, not as a structural difference in how the step exists in the system.
Implication: Add provenance fields to the step record, not a parallel provenance system.
Proposed provenance metadata (stored in workflow_steps or related JSONB):
| Field | Type | Description |
|---|---|---|
created_by | enum | template / decision_point / planning_step / batch_spawn |
planning_step_uuid | uuid? | The planning step that created this step (if applicable) |
fragment_id | string? | Identifier of the workflow fragment this step belongs to |
planning_phase | integer? | Which planning phase (1, 2, 3…) this step was created in |
planning_reasoning | text? | The planner’s reasoning for including this step |
handler_type | enum | application / grammar |
composition_source | string? | Whether a grammar handler referenced a common pattern name or was composed dynamically (metadata, not a type distinction) |
composition_spec | jsonb? | The grammar composition specification (if applicable) |
This metadata enriches observability without changing the step’s identity. Dashboards can filter by created_by to show only planned steps, or by handler_type to distinguish developer-authored from grammar-composed steps. Traces can include planning_step_uuid for drill-down. But the default view — the one operators see every day — shows steps as steps.
Principle 3: Progressive Disclosure
Not everyone needs to see everything. The default view should show what’s essential: steps, states, results, timing. Planning details, composition specifications, and agent lineage should be available on drill-down, not in the primary display.
Layer 1: Workflow Overview (same as static workflows)
- Task status, step states, dependency graph, timing
- Steps created by planners are visually annotated but not structurally different
- Dynamically composed steps are visually annotated but appear as normal steps
- Summary: “This task had 2 planning phases, created 14 steps total (12 grammar, 2 application)”
Layer 2: Planning Details (drill into a planning step)
- The planning prompt (what the LLM was asked)
- The capability schema provided (what the LLM could use)
- The generated fragment (what the LLM planned)
- The validation result (accepted/rejected, with diagnostics)
- Token usage, latency, model identifier
Layer 3: Fragment Analysis (drill into the fragment)
- The fragment’s DAG structure
- Each step’s handler configuration (common pattern reference, dynamic composition, or application callable)
- Input mappings and data flow
- Comparison with alternative fragments (if retry occurred)
Layer 4: Composition Details (drill into a grammar-composed step)
- The composition specification (which primitives, in what order, with what configuration)
- Whether the composition referenced a common pattern or was dynamically assembled
- Structural invariant validation results (contract compatibility, single-mutation boundary)
- Per-primitive execution details (timing, intermediate results if captured)
Layer 5: Execution Details (same as static workflows)
- Individual step execution: inputs, outputs, timing, retries
- Standard Tasker observability for each step
Layer 6: Agent Lineage (for agent-created task chains)
parent_correlation_idchain visualization- Task-level progression: research → analysis → workflow → execution
- Aggregate resource consumption across the delegation chain
- Agent decision points (implicit, inferred from task creation patterns)
Principle 4: Bounded Blast Radius
Dynamic planning introduces new classes of failure: a planning step generates a fragment that, while valid, produces poor results. A grammar composition passes structural validation but behaves unexpectedly at runtime. An agent creates a long chain of research tasks without converging on a design.
These failures are bounded by design:
- Each planning step’s fragment has resource limits (max steps, max depth)
- Task-level budgets cap total resource consumption across all phases
- Grammar compositions are validated against structural invariants before execution
- Convergence points are declared in the template (the frame), not by the planner
- Agent delegation chains have depth and budget limits
- The worst case is a task (or task chain) that consumes its budget without producing useful results — disappointing but not dangerous
Implication: Budget consumption should be a first-class metric. Operators should see: “This task has used 47 of 100 step budget, 3 of 5 planning phases, $2.30 of $5.00 cost budget.” For agent delegation chains: “This chain has 3 tasks across 2 delegation levels, consuming $12.40 of $50.00 aggregate budget.”
Principle 5: The Template is the Safety Contract
Even with dynamic planning and agent integration, the task template is the safety contract between the workflow author and the system. The template declares:
- What happens before planning (static steps)
- Where planning occurs (planning steps with constraints)
- What happens after planning (convergence and finalization)
- What the resource bounds are
- Whether dynamic composition beyond common patterns is allowed
The planner fills in the middle. It cannot modify the frame. It cannot bypass convergence. It cannot exceed its bounds. The template author retains control of the workflow’s structure; they delegate the topology of specific segments.
For agent-created tasks, the template still provides the safety contract. The agent selects which template to use (or constructs a task with planning steps), but the template’s constraints apply regardless of who submitted the task.
Implication: Template review is the primary code review artifact for dynamic workflows. If the template’s constraints are correct, the system’s behavior is bounded regardless of what the planner or agent does.
Observability Architecture for Dynamic Workflows
Telemetry Extensions
Standard telemetry (emitted by all steps, unchanged):
- Step lifecycle events (created, enqueued, claimed, executing, completed/failed)
- Step execution metrics (latency, retry count, handler name)
- Task lifecycle events (created, in_progress, completed/failed)
- Queue metrics (depth, claim rate, processing time)
Planning telemetry (emitted by planning steps, in addition to standard):
- LLM call metrics (model, token count input/output, latency, cost)
- Fragment generation metrics (steps planned, depth, handler distribution by type)
- Validation metrics (pass/fail, rejection reasons, retry count)
- Budget consumption (steps used / remaining, phases used / remaining, cost used / remaining)
Composition telemetry (emitted by grammar-composed steps, in addition to standard):
- Composition specification (primitives, configuration)
- Whether the composition referenced a common pattern or was dynamically assembled
- Structural invariant validation result (pass, with any warnings)
- Per-primitive timing (if captured — useful for identifying bottleneck primitives)
Provenance telemetry (emitted when dynamic steps are created):
- Step creation source (planning_step_uuid, fragment_id)
- Step’s position in fragment DAG (depth, breadth)
- Planning phase identifier
- Handler type (grammar, application)
Agent lineage telemetry (emitted for tasks with parent_correlation_id):
- Delegation depth (how many levels deep in the chain)
- Aggregate step count and cost across the chain
- Task creation timing (latency between parent task completion and child task creation)
Correlation Strategy
All telemetry for a dynamic workflow is correlated through existing mechanisms:
- Task UUID: Groups all steps (planned and static) in a single workflow
- Trace ID: Spans the entire task lifecycle, including planning
- Planning Step UUID: Links planned steps to their planner (via provenance metadata)
- Fragment ID: Groups steps from a single planning decision
- parent_correlation_id: Links tasks in an agent delegation chain
No new correlation mechanism is needed. The existing task → step hierarchy, extended with provenance metadata and the existing parent_correlation_id, supports all drill-down patterns.
Dashboarding Patterns
Task-Level Dashboard (extends existing):
- Task completion rate, segmented by planning phase count
- Average steps per task (static vs. dynamic, grammar vs. application)
- Budget utilization distribution (histogram)
- Planning success rate (fragments generated vs. fragments validated)
Planning-Specific Dashboard (new):
- LLM call volume and latency
- Most frequently used common patterns in planned fragments
- Most frequently dynamically composed handler patterns
- Fragment validation failure rate by rejection reason
- Cost per planning phase, averaged across tasks
- Planning depth distribution (how many phases do tasks actually use?)
Grammar Composition Dashboard (new):
- Handler usage distribution (common patterns vs. dynamic compositions)
- Handler execution success rate by handler type (grammar vs. application)
- Performance by handler (latency distribution per common pattern)
- Dynamic composition pattern frequency (candidates for named common patterns)
- Configuration pattern analysis (what configurations are most common?)
Agent Activity Dashboard (new):
- Tasks created by agents (volume, success rate)
- Delegation chain depth distribution
- Agent-to-decision latency (how long from first research task to final workflow)
- Aggregate resource consumption by agent delegation chain
- Research task convergence quality (how useful are research results to subsequent decisions?)
Alerting Patterns
| Alert | Trigger | Action |
|---|---|---|
| Planning step timeout | LLM call exceeds configured timeout | Retry or fail step based on retry policy |
| Fragment validation failure rate | > 30% of planning steps produce invalid fragments | Review capability schema, planning prompts |
| Dynamic composition runtime failure spike | Dynamically composed handlers failing at higher rate than common patterns | Review structural validation coverage, common failure compositions |
| Budget consumption anomaly | Task consuming budget > 2σ from mean | Investigate planning decisions, consider tighter bounds |
| Common pattern error spike | Handler failure rate > threshold | Investigate handler configuration patterns |
| Planning depth anomaly | Tasks consistently reaching max phases without converging | Review problem descriptions, planning prompts, or increase bounds |
| Agent delegation depth anomaly | Agent chains consistently reaching max depth | Review agent decomposition patterns, consider wider investigation templates |
LLM Context Management
The Context Window as a Design Constraint
The LLM planner’s effectiveness is directly proportional to the quality of information in its context window. Too little information and the planner makes poor decisions. Too much and the planner loses focus or hits token limits. This is a design constraint, not a runtime problem — the system must be designed to provide the right information in the right format.
Context Composition
The planning prompt is composed from these sources, in priority order:
- Problem description (from task context): What needs to be accomplished. Always included in full.
- Accumulated results (from prior phases): What has been learned. Summarized if large.
- Capability schema (from grammar primitives and common patterns): What the planner can use, including common patterns and composition rules. Potentially large; strategy required.
- Planning constraints (from template): Resource bounds, required convergence. Always included.
- Failure context (if retrying): What went wrong. Included on retry.
- Examples (from prompt engineering): Few-shot demonstrations. Carefully curated.
Capability Schema Compression
The full capability schema for all common patterns plus the grammar’s composition rules may exceed practical context window budgets. Strategies:
Tiered description:
- Tier 1: Handler name + one-line description, primitive name + one-line description (always included)
- Tier 2: Input/output schemas (included for handlers the planner selects)
- Tier 3: Full configuration reference (included on request or for complex handlers)
- Composition rules (including single-mutation boundary): always included at Tier 1 (the rules are compact); specific primitive schemas at Tier 2
Category-based inclusion:
- Include full schemas only for primitive categories relevant to the problem type
- Data processing problems get full
transform,validate,fan_outschemas - API integration problems get full
http_request,authschemas - Control flow gets
decide,gateschemas
Empirical calibration:
- Measure planning quality as a function of schema detail
- Find the minimum schema detail that produces valid fragments > 90% of the time
- This will vary by LLM model; calibrate per supported model
Result Summarization
Between planning phases, accumulated results must be compressed. The summarization strategy depends on result size:
| Result Size | Strategy | Example |
|---|---|---|
| < 1KB | Include verbatim | Status codes, counts, small payloads |
| 1KB - 10KB | Structured summary | Key fields extracted, schema preserved |
| 10KB - 100KB | LLM-generated summary | Dedicated summarization step before next planning step |
| > 100KB | Reference with metadata | Object store reference + schema + size + sample |
The summarization strategy should be configurable per planning step, with sensible defaults.
Operator Experience Design
Mental Model
The operator’s mental model for dynamic workflows should be:
“This workflow has a frame (the template) and fill (the planned steps). The frame is static and reviewed like any template. The fill is dynamic and generated by a planner. Fill steps use grammar compositions — some reference common patterns, others are composed dynamically from primitives. I monitor everything through the same tools, with drill-down into planning decisions and composition details when I need it.”
For agent-created workflows:
“An agent created a chain of tasks — first research, then execution. Each task is a normal Tasker task. I can see the chain through the parent_correlation_id lineage. Each task has its own frame and fill.”
These are the only new concepts operators need to learn. Everything else — step states, retries, convergence, DLQ — works the same way.
Investigation Workflow
When a dynamic workflow fails, the operator’s investigation follows this path:
- What failed? → Standard step failure view. Same as static workflows.
- Was this step planned or static? → Provenance metadata. One field check.
- If planned: what kind of handler? →
handler_typemetadata. Grammar or application. - If grammar: what was the composition? →
composition_specmetadata. See which primitives were used and whether it referenced a common pattern or was dynamically composed. - What was the planning decision? → Drill into planning step. See fragment, reasoning, validation.
- Was the plan reasonable? → Evaluate fragment structure, handler selection, configuration.
- If plan was bad: why? → Examine planning prompt, context, LLM response. Identify whether the issue is schema quality, context quality, or model quality.
- If plan was good but execution failed: why? → Standard step debugging. Inputs, outputs, error messages, retry history.
- Is this part of an agent chain? → Check
parent_correlation_id. Trace lineage to understand the broader investigation arc.
Steps 1-4 add approximately 15 seconds to investigation time. Steps 5-7 are new but only needed when the failure is planning-related. Steps 8-9 are unchanged or trivial lookups.
Runbooks
Dynamic workflows should ship with runbook extensions that cover:
- “A planning step is in DLQ” — how to investigate and resolve
- “A grammar-composed step failed” — how to examine the composition and identify the failing primitive
- “A task is consuming its budget without completing” — how to diagnose and intervene
- “Fragment validation failures are spiking” — how to diagnose capability schema issues
- “An LLM provider is returning errors” — how to fail over or degrade gracefully
- “An agent delegation chain is growing without converging” — how to investigate and intervene
Avoiding Complication: Anti-Patterns
These are specific patterns that introduce complication without corresponding complexity. The system design should prevent them.
| Anti-Pattern | Why It’s Complication | Prevention |
|---|---|---|
| Different observability for dynamic vs. static steps | Operators must maintain two mental models | All steps emit identical telemetry; provenance is metadata |
| Different observability for common patterns vs. dynamic compositions | Operators must learn new debugging tools | All grammar compositions emit identical step telemetry; composition source is drill-down metadata |
| Planning logic embedded in handler configuration | Handler behavior becomes unpredictable | Handlers are deterministic; planning is a separate step type |
| Fragment schema coupled to Tasker internals | LLM must understand orchestration mechanics | Fragment schema expresses intent; materialization is the system’s job |
| Budget controls scattered across configuration | No single place to understand resource limits | Budget hierarchy in task template, visible and auditable |
| Context accumulation that silently drops information | Planning quality degrades mysteriously | Explicit summarization steps with configurable strategies |
| Capability schema that describes implementation | LLM reasons about wrong abstractions | Capability schema describes what, never how. Derived from grammar types, not hand-authored |
| Action grammar internals exposed to operators | Operators must understand Rust trait composition | Grammar compositions are opaque to the operator; they see “http_request handler” not “Acquire → Transform → Validate” |
| Runtime validation duplicating compile-time checks | Wasted cycles and confusing error messages | Primitive correctness is verified at compile time; composition correctness at assembly time; runtime validates only fragment references |
| Agent-specific task types or APIs | Agents appear as a special class of client | All tasks are identical; agents use the same API as any client |
| Agent delegation tracking in a separate system | Lineage requires cross-system correlation | parent_correlation_id is a standard task field; lineage queries use standard task queries |
| Grammar composition details exposed in workflow visualization by default | Operators see implementation details they don’t need | Composition details are available on drill-down, not in the default step view |
Summary: The Complexity Budget
Every system has a complexity budget — the amount of complexity humans can manage before the system becomes opaque. Dynamic workflow planning spends from this budget. The question is whether we get proportional value.
What we spend:
- One new step type (planning step)
- One new concept (workflow fragments)
- One new compositional layer (action grammar primitives — but invisible to operators)
- One new metadata layer (planning and composition provenance)
- One new resource dimension (planning budgets)
- One new trust distinction (developer-authored handlers vs. system-invoked grammar compositions)
- One new client pattern (agents as task-creating clients — but using existing APIs)
What we get:
- Workflows that adapt to their inputs
- Multi-phase problem solving with accumulated context
- Composition of generic capabilities without custom code, with compile-time verified primitives and assembly-time validated compositions
- Dynamic compositions that can be constructed for any problem without registering new patterns
- Agents that can structure their own investigation using Tasker’s execution guarantees
- Gradual automation of workflow design — from developer tooling (Phase 0) through agent-driven workflows
- A type system for workflow actions that makes the vocabulary extensible without sacrificing safety
What we protect:
- The step as the atomic unit (unchanged)
- Execution guarantees (unchanged)
- Observability patterns (extended, not replaced)
- Operator investigation workflows (extended, not replaced)
- Template as safety contract (strengthened, not weakened)
- API uniformity (agents use the same APIs as any client)
What we actively reduce:
- Runtime type errors in handler compositions (primitives verified at compile time, compositions validated at assembly time)
- Capability schema drift from implementation (schemas derived from types, not hand-maintained)
- Configuration-driven failure modes (grammar compositions are verified before they can be referenced)
- Agent investigation overhead (structured research workflows replace ad hoc manual investigation)
The complexity budget is balanced when the new capabilities justify the new concepts, and the existing foundations are preserved. This document’s purpose is to ensure we stay within budget.
This document applies to all phases of the generative workflow initiative and the agent integration patterns. It should be reviewed and updated as each phase is implemented and operational experience reveals new complexity management needs.