Benchmark Implementation Decision: Event-Driven + E2E Focus

Date: 2025-10-08 Decision: Focus on event propagation and E2E benchmarks; infer worker metrics from traces

Context

Original Phase 5.4 plan included 7 benchmark categories:

✅ API Task Creation
🚧 Worker Processing Cycle
✅ Event Propagation
🚧 Step Enqueueing
🚧 Handler Overhead
✅ SQL Functions
✅ E2E Latency

Architectural Challenge: Worker Benchmarking

Problem: Direct worker benchmarking doesn’t match production reality

In a distributed system with multiple workers:

❌ Can’t predict which worker will claim which step
❌ Can’t control step distribution across workers
❌ Artificial scenarios required to direct specific steps to specific workers
❌ API queries would need to know which worker to query (unknowable in advance)

Example:

Task with 10 steps across 3 workers:
- Worker A might claim steps 1, 3, 7
- Worker B might claim steps 2, 5, 6, 9
- Worker C might claim steps 4, 8, 10

Which worker do you benchmark? How do you ensure consistent measurement?

Decision: Focus on Observable Metrics

✅ What We WILL Measure Directly

1. Event Propagation (`tasker-shared/benches/event_propagation.rs`)

Status: ✅ IMPLEMENTED

Measures: PostgreSQL LISTEN/NOTIFY round-trip latency

Approach:

#![allow(unused)]
fn main() {
// Setup listener on test channel
listener.listen("pgmq_message_ready.benchmark_event_test").await;

// Send message with notify
let send_time = Instant::now();
sqlx::query("SELECT pgmq_send_with_notify(...)").execute(&pool).await;

// Measure until listener receives
let received_at = listener.recv().await;
let latency = received_at.duration_since(send_time);
}

Why it works:

Observable from outside the system
Deterministic measurement (single listener, single sender)
Matches production behavior (real LISTEN/NOTIFY path)
Critical for worker responsiveness

Expected Performance: < 5-10ms p95

2. End-to-End Latency (`tests/benches/e2e_latency.rs`)

Status: ✅ IMPLEMENTED

Measures: Complete workflow execution (API → Task Complete)

Approach:

#![allow(unused)]
fn main() {
// Create task
let response = client.create_task(request).await;
let start = Instant::now();

// Poll for completion
loop {
    let task = client.get_task(task_uuid).await;
    if task.execution_status == "AllComplete" {
        return start.elapsed();
    }
    tokio::time::sleep(Duration::from_millis(50)).await;
}
}

Why it works:

Measures user experience (submit → result)
Naturally includes ALL system overhead:
- API processing
- Database writes
- Message queue latency
- Worker claim/execute/submit (embedded in total time)
- Event propagation
- Orchestration coordination
No need to know which workers executed which steps
Reflects real production behavior

Expected Performance:

Linear (3 steps): < 500ms p99
Diamond (4 steps): < 800ms p99

📊 What We WILL Infer from Traces

Worker-Level Breakdown via OpenTelemetry

Instead of direct benchmarking, use existing OpenTelemetry instrumentation:

# Query traces by correlation_id from E2E benchmark
curl "http://localhost:16686/api/traces?service=tasker-worker&tags=correlation_id:abc-123"

# Extract span timings:
{
  "spans": [
    {"operationName": "step_claim",       "duration": 15ms},
    {"operationName": "execute_handler",  "duration": 42ms},  // Business logic
    {"operationName": "submit_result",    "duration": 23ms}
  ]
}

Advantages:

✅ Works across distributed workers (correlation ID links everything)
✅ Captures real production behavior (actual task execution)
✅ Breaks down by step type (different handlers have different timing)
✅ Shows which worker processed each step
✅ Already instrumented (Phase 3.3 work)

Metrics Available:

step_claim_duration - Time to claim step from queue
handler_execution_duration - Time to execute handler logic
result_submission_duration - Time to submit result back
ffi_overhead - Rust vs Ruby handler comparison

🚧 Benchmarks NOT Implemented (By Design)

Worker Processing Cycle (`tasker-worker/benches/worker_execution.rs`)

Status: 🚧 Skeleton only (placeholder)

Why not implemented:

Requires artificial pre-arrangement of which worker claims which step
Doesn’t match production (multiple workers competing for steps)
Metrics available via OpenTelemetry traces instead

Alternative: Query traces for step_claim → execute_handler → submit_result span timing

Step Enqueueing (`tasker-orchestration/benches/step_enqueueing.rs`)

Status: 🚧 Skeleton only (placeholder)

Why not implemented:

Difficult to trigger orchestration step discovery without full execution
Result naturally embedded in E2E latency measurement
Coordination overhead visible in E2E timing

Alternative: E2E benchmark includes step enqueueing naturally

Handler Overhead (`tasker-worker/benches/handler_overhead.rs`)

Status: 🚧 Skeleton only (placeholder)

Why not implemented:

FFI overhead varies by handler type (can’t benchmark in isolation)
Real overhead visible in E2E benchmark + traces
Rust vs Ruby comparison available via trace analysis

Alternative: Compare handler_execution_duration spans for Rust vs Ruby handlers in traces

Implementation Summary

✅ Complete Benchmarks (3/7)

Benchmark	Status	Measures	Run Command
SQL Functions	✅ Complete	PostgreSQL function performance	`DATABASE_URL=... cargo bench -p tasker-shared --features benchmarks sql_functions`
Task Initialization	✅ Complete	API task creation latency	`cargo bench -p tasker-client --features benchmarks`
Event Propagation	✅ Complete	LISTEN/NOTIFY round-trip	`DATABASE_URL=... cargo bench -p tasker-shared --features benchmarks event_propagation`
E2E Latency	✅ Complete	Complete workflow execution	`cargo bench --test e2e_latency`

🚧 Placeholder Benchmarks (3/7)

Benchmark	Status	Alternative Measurement
Worker Execution	🚧 Placeholder	OpenTelemetry traces (correlation ID)
Step Enqueueing	🚧 Placeholder	Embedded in E2E latency
Handler Overhead	🚧 Placeholder	OpenTelemetry span comparison (Rust vs Ruby)

Advantages of This Approach

1. Matches Production Reality

E2E benchmark reflects actual user experience
No artificial worker pre-arrangement required
Measures real distributed system behavior

2. Complete Coverage

E2E latency includes ALL components naturally
OpenTelemetry provides worker-level breakdown
Event propagation measures critical notification path

3. Lower Maintenance

Fewer benchmarks to maintain
No complex setup for worker isolation
Traces provide flexible analysis

4. Better Insights

Correlation IDs link entire workflow across services
Can analyze timing for ANY task in production
Breakdown available on-demand via trace queries

How to Use This System

Running Performance Analysis

Step 1: Run E2E benchmark

cargo bench --test e2e_latency

Step 2: Extract correlation_id from benchmark output

Created task: abc-123-def-456 (correlation_id: xyz-789)

Step 3: Query traces for breakdown

# Jaeger UI or API
curl "http://localhost:16686/api/traces?tags=correlation_id:xyz-789"

Step 4: Analyze span timing

{
  "spans": [
    {"service": "orchestration", "operation": "create_task", "duration": 18ms},
    {"service": "orchestration", "operation": "enqueue_steps", "duration": 12ms},
    {"service": "worker", "operation": "step_claim", "duration": 15ms},
    {"service": "worker", "operation": "execute_handler", "duration": 42ms},
    {"service": "worker", "operation": "submit_result", "duration": 23ms},
    {"service": "orchestration", "operation": "process_result", "duration": 8ms}
  ]
}

Total E2E: ~118ms (matches benchmark) Worker overhead: 15ms + 23ms = 38ms (claim + submit, excluding business logic)

Recommendations

Completion Criteria

✅ Complete with 4 working benchmarks:

SQL Functions
Task Initialization
Event Propagation
E2E Latency

📋 Document that worker-level metrics come from OpenTelemetry

For Future Enhancement

If direct worker benchmarking becomes necessary:

Use single-worker mode Docker Compose configuration
Pre-create tasks with known step assignments
Query specific worker API for deterministic steps
Document as synthetic benchmark (not matching production)

For Production Monitoring

Use OpenTelemetry for ongoing performance analysis:

Set up trace retention (7-30 days)
Create Grafana dashboards for span timing
Alert on p95 latency increases
Analyze slow workflows via correlation ID

Conclusion

Decision: Focus on event propagation and E2E latency benchmarks, use OpenTelemetry traces for worker-level breakdown.

Rationale: Matches production reality, provides complete coverage, lower maintenance, better insights.

Status: ✅ 4/4 practical benchmarks implemented and working

Keyboard shortcuts

Tasker Documentation