Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmark Implementation Decision: Event-Driven + E2E Focus

Date: 2025-10-08 Decision: Focus on event propagation and E2E benchmarks; infer worker metrics from traces


Context

Original Phase 5.4 plan included 7 benchmark categories:

  1. βœ… API Task Creation
  2. 🚧 Worker Processing Cycle
  3. βœ… Event Propagation
  4. 🚧 Step Enqueueing
  5. 🚧 Handler Overhead
  6. βœ… SQL Functions
  7. βœ… E2E Latency

Architectural Challenge: Worker Benchmarking

Problem: Direct worker benchmarking doesn’t match production reality

In a distributed system with multiple workers:

  • ❌ Can’t predict which worker will claim which step
  • ❌ Can’t control step distribution across workers
  • ❌ Artificial scenarios required to direct specific steps to specific workers
  • ❌ API queries would need to know which worker to query (unknowable in advance)

Example:

Task with 10 steps across 3 workers:
- Worker A might claim steps 1, 3, 7
- Worker B might claim steps 2, 5, 6, 9
- Worker C might claim steps 4, 8, 10

Which worker do you benchmark? How do you ensure consistent measurement?

Decision: Focus on Observable Metrics

βœ… What We WILL Measure Directly

1. Event Propagation (tasker-shared/benches/event_propagation.rs)

Status: βœ… IMPLEMENTED

Measures: PostgreSQL LISTEN/NOTIFY round-trip latency

Approach:

#![allow(unused)]
fn main() {
// Setup listener on test channel
listener.listen("pgmq_message_ready.benchmark_event_test").await;

// Send message with notify
let send_time = Instant::now();
sqlx::query("SELECT pgmq_send_with_notify(...)").execute(&pool).await;

// Measure until listener receives
let received_at = listener.recv().await;
let latency = received_at.duration_since(send_time);
}

Why it works:

  • Observable from outside the system
  • Deterministic measurement (single listener, single sender)
  • Matches production behavior (real LISTEN/NOTIFY path)
  • Critical for worker responsiveness

Expected Performance: < 5-10ms p95


2. End-to-End Latency (tests/benches/e2e_latency.rs)

Status: βœ… IMPLEMENTED

Measures: Complete workflow execution (API β†’ Task Complete)

Approach:

#![allow(unused)]
fn main() {
// Create task
let response = client.create_task(request).await;
let start = Instant::now();

// Poll for completion
loop {
    let task = client.get_task(task_uuid).await;
    if task.execution_status == "AllComplete" {
        return start.elapsed();
    }
    tokio::time::sleep(Duration::from_millis(50)).await;
}
}

Why it works:

  • Measures user experience (submit β†’ result)
  • Naturally includes ALL system overhead:
    • API processing
    • Database writes
    • Message queue latency
    • Worker claim/execute/submit (embedded in total time)
    • Event propagation
    • Orchestration coordination
  • No need to know which workers executed which steps
  • Reflects real production behavior

Expected Performance:

  • Linear (3 steps): < 500ms p99
  • Diamond (4 steps): < 800ms p99

πŸ“Š What We WILL Infer from Traces

Worker-Level Breakdown via OpenTelemetry

Instead of direct benchmarking, use existing OpenTelemetry instrumentation:

# Query traces by correlation_id from E2E benchmark
curl "http://localhost:16686/api/traces?service=tasker-worker&tags=correlation_id:abc-123"

# Extract span timings:
{
  "spans": [
    {"operationName": "step_claim",       "duration": 15ms},
    {"operationName": "execute_handler",  "duration": 42ms},  // Business logic
    {"operationName": "submit_result",    "duration": 23ms}
  ]
}

Advantages:

  • βœ… Works across distributed workers (correlation ID links everything)
  • βœ… Captures real production behavior (actual task execution)
  • βœ… Breaks down by step type (different handlers have different timing)
  • βœ… Shows which worker processed each step
  • βœ… Already instrumented (Phase 3.3 work)

Metrics Available:

  • step_claim_duration - Time to claim step from queue
  • handler_execution_duration - Time to execute handler logic
  • result_submission_duration - Time to submit result back
  • ffi_overhead - Rust vs Ruby handler comparison

🚧 Benchmarks NOT Implemented (By Design)

Worker Processing Cycle (tasker-worker/benches/worker_execution.rs)

Status: 🚧 Skeleton only (placeholder)

Why not implemented:

  • Requires artificial pre-arrangement of which worker claims which step
  • Doesn’t match production (multiple workers competing for steps)
  • Metrics available via OpenTelemetry traces instead

Alternative: Query traces for step_claim β†’ execute_handler β†’ submit_result span timing


Step Enqueueing (tasker-orchestration/benches/step_enqueueing.rs)

Status: 🚧 Skeleton only (placeholder)

Why not implemented:

  • Difficult to trigger orchestration step discovery without full execution
  • Result naturally embedded in E2E latency measurement
  • Coordination overhead visible in E2E timing

Alternative: E2E benchmark includes step enqueueing naturally


Handler Overhead (tasker-worker/benches/handler_overhead.rs)

Status: 🚧 Skeleton only (placeholder)

Why not implemented:

  • FFI overhead varies by handler type (can’t benchmark in isolation)
  • Real overhead visible in E2E benchmark + traces
  • Rust vs Ruby comparison available via trace analysis

Alternative: Compare handler_execution_duration spans for Rust vs Ruby handlers in traces


Implementation Summary

βœ… Complete Benchmarks (3/7)

BenchmarkStatusMeasuresRun Command
SQL Functionsβœ… CompletePostgreSQL function performanceDATABASE_URL=... cargo bench -p tasker-shared --features benchmarks sql_functions
Task Initializationβœ… CompleteAPI task creation latencycargo bench -p tasker-client --features benchmarks
Event Propagationβœ… CompleteLISTEN/NOTIFY round-tripDATABASE_URL=... cargo bench -p tasker-shared --features benchmarks event_propagation
E2E Latencyβœ… CompleteComplete workflow executioncargo bench --test e2e_latency

🚧 Placeholder Benchmarks (3/7)

BenchmarkStatusAlternative Measurement
Worker Execution🚧 PlaceholderOpenTelemetry traces (correlation ID)
Step Enqueueing🚧 PlaceholderEmbedded in E2E latency
Handler Overhead🚧 PlaceholderOpenTelemetry span comparison (Rust vs Ruby)

Advantages of This Approach

1. Matches Production Reality

  • E2E benchmark reflects actual user experience
  • No artificial worker pre-arrangement required
  • Measures real distributed system behavior

2. Complete Coverage

  • E2E latency includes ALL components naturally
  • OpenTelemetry provides worker-level breakdown
  • Event propagation measures critical notification path

3. Lower Maintenance

  • Fewer benchmarks to maintain
  • No complex setup for worker isolation
  • Traces provide flexible analysis

4. Better Insights

  • Correlation IDs link entire workflow across services
  • Can analyze timing for ANY task in production
  • Breakdown available on-demand via trace queries

How to Use This System

Running Performance Analysis

Step 1: Run E2E benchmark

cargo bench --test e2e_latency

Step 2: Extract correlation_id from benchmark output

Created task: abc-123-def-456 (correlation_id: xyz-789)

Step 3: Query traces for breakdown

# Jaeger UI or API
curl "http://localhost:16686/api/traces?tags=correlation_id:xyz-789"

Step 4: Analyze span timing

{
  "spans": [
    {"service": "orchestration", "operation": "create_task", "duration": 18ms},
    {"service": "orchestration", "operation": "enqueue_steps", "duration": 12ms},
    {"service": "worker", "operation": "step_claim", "duration": 15ms},
    {"service": "worker", "operation": "execute_handler", "duration": 42ms},
    {"service": "worker", "operation": "submit_result", "duration": 23ms},
    {"service": "orchestration", "operation": "process_result", "duration": 8ms}
  ]
}

Total E2E: ~118ms (matches benchmark) Worker overhead: 15ms + 23ms = 38ms (claim + submit, excluding business logic)


Recommendations

Completion Criteria

βœ… Complete with 4 working benchmarks:

  1. SQL Functions
  2. Task Initialization
  3. Event Propagation
  4. E2E Latency

πŸ“‹ Document that worker-level metrics come from OpenTelemetry

For Future Enhancement

If direct worker benchmarking becomes necessary:

  1. Use single-worker mode Docker Compose configuration
  2. Pre-create tasks with known step assignments
  3. Query specific worker API for deterministic steps
  4. Document as synthetic benchmark (not matching production)

For Production Monitoring

Use OpenTelemetry for ongoing performance analysis:

  • Set up trace retention (7-30 days)
  • Create Grafana dashboards for span timing
  • Alert on p95 latency increases
  • Analyze slow workflows via correlation ID

Conclusion

Decision: Focus on event propagation and E2E latency benchmarks, use OpenTelemetry traces for worker-level breakdown.

Rationale: Matches production reality, provides complete coverage, lower maintenance, better insights.

Status: βœ… 4/4 practical benchmarks implemented and working