Phase 2: Planning Interface
LLM-backed planning steps and workflow fragment generation from action grammar primitives
Phase Summary
The planning interface introduces a new step handler type — the planning step — that uses an LLM to generate workflow fragments composed from action grammar primitives. The orchestration layer validates these fragments against the grammar’s structural invariants (contract compatibility, single-mutation boundary) and materializes them through the existing transactional step creation infrastructure.
This is the phase where generative planning meets deterministic execution. The contract is: given context and a capability vocabulary, produce a valid plan. The system validates the plan’s grammar compositions against structural invariants, checks the DAG structure, and executes with full transactional guarantees. The LLM reasons; the system guarantees.
Planning steps can appear in any task — including tasks created by agent clients (see Agent Orchestration). An agent that creates a research task containing a planning step gets the benefit of both its own high-level reasoning (what to investigate) and the LLM planner’s tactical composition (how to structure the investigation). The planning step doesn’t know or care that its parent task was agent-created; it generates and validates fragments the same way in all contexts.
Phase 0’s MCP server experience directly informs this phase — the same prompt engineering patterns, validation feedback loops, and structured output strategies that work for developer-time authoring apply to runtime planning.
Research Areas
1. Workflow Fragment Schema
Question: What is the structural representation of a workflow fragment that a planning step produces?
Research approach:
- Start from the existing
DecisionPointOutcome::CreateSteps { step_names }and extend it to carry full step specifications including grammar compositions - Evaluate what minimum information is needed to materialize a step (common pattern reference, dynamic grammar composition, or application callable, plus inputs, dependencies, configuration)
- Design for validation: the schema should make it structurally impossible to express certain invalid states
Proposed fragment structure:
{
"fragment_version": "1.0",
"planning_context": {
"goal": "Process and enrich customer records from CSV upload",
"reasoning": "The dataset contains 5000 records requiring validation, API enrichment, and categorization...",
"estimated_steps": 7,
"estimated_depth": 3
},
"steps": [
{
"name": "validate_records",
"handler": {
"grammar": "validate",
"config": {
"schema": { "$ref": "customer_record_v2" },
"on_invalid": "flag"
}
},
"dependencies": [],
"inputs": {
"data": "${task.context.uploaded_records}"
}
},
{
"name": "enrich_valid",
"handler": {
"grammar": "http_request",
"config": {
"url": "https://enrichment-api.example.com/v2/enrich",
"method": "POST",
"body": { "records": "${steps.validate_records.result.valid_records}" }
}
},
"dependencies": ["validate_records"]
},
{
"name": "categorize",
"handler": {
"composition": {
"primitives": [
{
"type": "Transform",
"variant": "Categorize",
"config": {
"categories": ["premium", "standard", "review"],
"rules": { "$ref": "categorization_rules_v1" }
}
},
{
"type": "Validate",
"variant": "CategoryRules",
"config": {
"schema_ref": "categorized_record_v1"
},
"input_mapping": {
"data": "$.previous.categorized_data"
}
}
],
"mixins": ["WithObservability"]
}
},
"dependencies": ["enrich_valid"]
},
{
"name": "converge_results",
"handler": {
"grammar": "aggregate",
"config": {
"strategy": "merge",
"output_key": "final_results"
}
},
"dependencies": ["categorize"],
"step_type": "deferred"
}
],
"convergence": "converge_results",
"resource_bounds": {
"max_downstream_steps": 15,
"max_downstream_depth": 2
}
}
Note the three ways a fragment can reference handlers: by common pattern name ("grammar": "validate") for named grammar compositions, by direct composition ("composition": {...}) for dynamically assembled grammar compositions, or by application-specific callable ("callable": "...") for developer-authored handlers. The first two are the same composition model — a common pattern is simply a named, well-tested composition specification. Both are validated at assembly time against structural invariants (contract compatibility, single-mutation boundary). Callable references are validated against the handler resolver.
Open questions:
- How should fragments reference the planning step’s own results? The planning step creates downstream steps, but those steps need to reference data that existed before planning.
- Should fragments support conditional sub-paths, or should every conditional branch require a nested planning step?
- How do we handle fragments that reference both grammar compositions and application-specific handlers in the same fragment?
- Should the LLM planner prefer common patterns when available and compose dynamically only for novel operations?
2. Fragment Validation Pipeline
Question: What validations must pass before a fragment is materialized?
Research approach:
- Enumerate all failure modes of an invalid fragment
- Design a validation pipeline that catches errors early with actionable diagnostics
- Leverage grammar structural invariants for composition validation
Proposed validation stages:
| Stage | Validates | Failure Mode |
|---|---|---|
| Schema validation | Fragment structure conforms to fragment schema | Malformed fragment — parsing error |
| Pattern reference check | Every common pattern reference exists in the registered patterns | Unknown pattern — planner hallucinated a capability |
| Composition structural validation | All grammar compositions have compatible input/output schemas across primitives and respect the single-mutation boundary | Contract mismatch or safety violation — planner composed incompatible or unsafe primitives |
| Configuration validation | Handler config matches the handler’s configuration JSON Schema | Invalid config — wrong types, missing required fields |
| DAG validation | Dependencies form a valid acyclic graph | Cycle detected, orphan steps, unreachable convergence |
| Input reference resolution | All ${step_reference} paths resolve to steps in the fragment or existing task context | Dangling reference — step references nonexistent upstream |
| Data contract compatibility | Output contracts of upstream steps match input contracts of downstream steps | Shape mismatch — data flow is inconsistent |
| Resource bound check | Total steps, depth, fan-out factor within configured limits | Plan exceeds bounds — too large, too deep, too expensive |
| Convergence validation | Deferred steps have valid intersection semantics with fragment steps | Convergence cannot resolve — no path to terminal state |
The composition structural validation stage uses the same grammar contract metadata that the Rust compiler uses for primitive verification, applied at assembly time to LLM-generated composition specifications. Primitives are compile-time verified Rust; compositions are validated at assembly time against structural invariants (contract compatibility, single-mutation boundary). An invalid composition is rejected at planning validation, not at step execution.
Open questions:
- Should validation be a single pass or multi-pass? (Early termination vs. collecting all errors)
- Should the planner receive validation feedback and be able to revise? (Planning → validate → revise loop)
- What diagnostic information should be stored when a fragment fails validation? (For observability and planner improvement)
- Should there be a “simulation” mode that validates and reports without materializing?
- How many planning attempts should be allowed before failing? (One shot? Up to 3 with validation feedback?)
3. LLM Integration Adapter
Question: How should the planning step interface with LLM APIs?
Research approach:
- Build on Phase 0 MCP server experience for prompt engineering and structured output patterns
- Design an adapter pattern that supports multiple LLM providers
- Evaluate context window management strategies for grammar capability schemas
Provider abstraction: The planning step should not be coupled to a specific LLM API. An adapter interface should support at minimum Claude (Anthropic API) and OpenAI-compatible endpoints, with the ability to add providers.
Prompt construction: The planning prompt has several components, informed by MCP server experience:
- System context: “You are a workflow planner. Generate a workflow fragment using the available action grammar primitives and common patterns.”
- Capability schema: Machine-readable descriptions of available primitives, common patterns, and composition rules (derived from grammar types in Phase 1)
- Task context: The problem description, input data schema, and any accumulated results
- Planning constraints: Resource bounds, required convergence points, any domain-specific rules
- Output format: The fragment schema with examples showing common pattern references, dynamic compositions, and mixed fragments
- Validation feedback: If retrying, the validation errors from the previous attempt
Context window management: Capability schemas derived from grammar compositions can be large. Strategies (validated through Phase 0 MCP server experience):
- Tiered descriptions: primitive names + one-line descriptions always included; full type signatures included for primitives the LLM selects
- Category-based inclusion: data processing problems get full Transform/Validate schemas; API integration problems get full Acquire/Emit schemas
- Composition rules included as a concise reference — the planner needs to know the structural invariants (especially the single-mutation boundary) without seeing every possible combination
- Few-shot examples demonstrating common patterns and dynamic compositions
Structured output: Use function calling / tool use APIs where available to constrain the LLM’s output to valid fragment structures. Fall back to JSON mode with post-hoc validation where function calling isn’t supported.
Open questions:
- Should the planning step cache successful plans for similar problem descriptions?
- How should model selection work? (Configurable per planning step? Global default with overrides?)
- What telemetry should be emitted from the LLM call? (Token counts, latency, planning reasoning)
- Should the planner have access to the grammar’s type signatures, or only the derived capability schemas?
4. Fragment Materialization
Question: How does a validated fragment become real workflow steps?
Research approach:
- Study the existing
ResultProcessingServicepath for decision point outcomes - Determine what modifications are needed to support full step specifications with grammar compositions
- Validate transactional guarantees are preserved with richer creation payloads
The current flow for decision points is:
- Decision handler returns
DecisionPointOutcome::CreateSteps { step_names } ResultProcessingServicevalidates step names exist in the template- Steps are created from template definitions in a single transaction
- Edges are created connecting the decision step to new steps
- New steps are enqueued
For planning steps, the flow extends to:
- Planning handler returns a validated workflow fragment
- Fragment materialization service creates steps from fragment specifications (not template)
- Steps are created with grammar compositions (common pattern references or dynamic compositions), configurations, and inputs in a single transaction
- Edges are created from the fragment’s dependency declarations
- New steps are enqueued for the appropriate namespace (grammar workers for grammar-composed steps and application workers for app-specific handlers)
The key difference: instead of looking up step definitions in a template, the materialization service uses the fragment’s step specifications directly. Grammar-composed steps route to grammar workers; application-specific steps route to their registered namespace.
Open questions:
- Should fragment materialization be a separate service or an extension of
ResultProcessingService? - How should the planning step’s own task template relate to the materialized fragment? Is it a “meta-template” that declares the planning step and convergence, with the middle filled in dynamically?
- What happens if materialization fails after partial creation? (Transaction should handle this, but worth explicit validation)
5. The Planning Step Template Pattern
Question: What does a task template look like when it includes planning steps?
Proposed template pattern:
name: adaptive_data_processing
namespace_name: dynamic_planning
version: 1.0.0
description: Process data with LLM-planned workflow
grammar_patterns: standard_v1
steps:
- name: ingest_data
type: standard
handler:
callable: DataIngestionHandler # Application-specific
dependencies: []
- name: plan_processing
type: planning # New step type
handler:
grammar: planning_step
config:
model: claude-sonnet-4-5-20250929
capability_schema: standard_v1
max_fragment_steps: 20
max_fragment_depth: 3
allow_dynamic_composition: true # Permit dynamic grammar composition beyond common patterns
planning_prompt: |
Given the ingested data characteristics, plan a processing
workflow that validates, enriches, and categorizes the records.
Use common patterns where available. Compose dynamically from
grammar primitives for operations that don't map to existing patterns.
context_from:
- ingest_data
dependencies:
- ingest_data
- name: finalize
type: deferred
handler:
callable: FinalizationHandler # Application-specific
dependencies:
- plan_processing # Intersection semantics with planned steps
Key insight: The template defines the frame — what happens before planning and what happens after convergence. The middle is filled in by the planner using grammar compositions — common patterns for well-tested operations, dynamic compositions for novel operations. This preserves the template’s role as a structural contract while enabling dynamic topology. Developer-authored handlers (DataIngestionHandler, FinalizationHandler) coexist with grammar-composed planned steps in the same workflow.
The allow_dynamic_composition flag gives template authors control over whether the planner can compose novel handlers from primitives or must limit itself to common patterns. This is a safety lever: templates used in high-trust environments can enable dynamic composition for maximum flexibility, while templates used in more constrained environments can restrict to well-tested common patterns. In both cases, compositions are validated against the same structural invariants (contract compatibility, single-mutation boundary).
Prototyping Goals
Prototype 1: Fragment Schema and Validation
Objective: Define the fragment schema and implement the validation pipeline, including grammar composition structural invariant checking, independent of LLM integration.
Success criteria:
- Fragment schema defined with JSON Schema, supporting common pattern references, dynamic compositions, and callable handler references
- Validation pipeline rejects all identified invalid fragment patterns
- Composition validation catches incompatible primitive chains and single-mutation boundary violations
- Validation produces actionable diagnostic messages
- Valid fragments can be materialized into workflow steps in a test environment
Prototype 2: LLM-Generated Fragments
Objective: Validate that an LLM can generate valid workflow fragments from grammar capability schemas, including dynamic compositions.
Success criteria:
- Claude generates valid fragments for at least 3 distinct problem types
- Generated fragments pass the validation pipeline including structural invariant checking
- Fragments use common patterns, dynamic compositions, or both appropriately
- Planning prompt engineering (informed by Phase 0 MCP server experience) produces consistent results
Prototype 3: End-to-End Planning and Execution
Objective: Execute a complete workflow with an LLM planning step.
Success criteria:
- Task is created with a planning step in its template
- Planning step calls LLM, receives fragment, passes validation
- Fragment is materialized as workflow steps (grammar-composed and/or application-specific)
- Planned steps execute through appropriate workers
- Convergence step receives results from planned steps
- Full workflow observable through standard Tasker telemetry
- Agent-created tasks containing planning steps execute correctly with
parent_correlation_idtraceability
Validation Criteria for Phase Completion
- Workflow fragment schema defined and documented, supporting common pattern references, dynamic compositions, and application callables
- Fragment validation pipeline implemented with all stages including composition structural invariant checking
- Planning step handler type implemented in at least Rust
- LLM integration adapter supports at least one provider (recommend: Anthropic API)
- Fragment materialization extends existing step creation with full transactional guarantees
- At least 3 end-to-end workflows demonstrated with LLM planning, including at least one using dynamic composition
- Validation failure modes tested and documented with diagnostic output
- Planning step telemetry includes LLM call metrics, fragment structure, and validation results
Relationship to Other Phases
- Phase 0 informs this phase: MCP server experience with LLM integration, prompt engineering, and validation feedback transfers directly.
- Phase 1 is a prerequisite: planning steps generate fragments that reference grammar compositions — both common patterns and dynamic compositions.
- Phase 3 extends this phase: recursive planning is nested planning steps within planned fragments.
- Agent orchestration composes with this phase: agents can create tasks containing planning steps, combining agent-level reasoning with LLM planning-level composition.
This document will be updated as Phase 1 progresses and reveals design insights that inform planning interface design.