Benchmark Audit & Profiling Plan
Created: 2025-10-09 Status: π Planning Purpose: Audit existing benchmarks, establish profiling tooling, baseline before Actor/Services refactor
Executive Summary
Before refactoring tasker-orchestration/src/orchestration/lifecycle/ to Actor/Services pattern, we need to:
- Audit Benchmarks: Review which benchmarks are implemented vs placeholders
- Clean Up: Remove or complete placeholder benchmarks
- Establish Profiling: Set up flamegraph/samply tooling
- Baseline Profiles: Capture performance profiles for comparison post-refactor
Current Status: We have working SQL and E2E benchmarks but several placeholder component benchmarks that need decisions.
Benchmark Inventory
β Working & Complete Benchmarks
1. SQL Function Benchmarks
- Location:
tasker-shared/benches/sql_functions.rs - Status: β Complete, Compiles, Well-documented
- Coverage:
get_next_ready_tasks()(4 batch sizes)get_step_readiness_status()(5 diverse samples)transition_task_state_atomic()(5 samples)get_task_execution_context()(5 samples)get_step_transitive_dependencies()(10 samples)
- Documentation:
docs/observability/benchmarking-guide.md - Run Command:
cargo bench --package tasker-shared --features benchmarks
2. Event Propagation Benchmarks
- Location:
tasker-shared/benches/event_propagation.rs - Status: β Complete, Compiles
- Coverage: PostgreSQL LISTEN/NOTIFY event propagation
- Run Command:
cargo bench --package tasker-shared --features benchmarks event_propagation
3. Task Initialization Benchmarks
- Location:
tasker-client/benches/task_initialization.rs - Status: β Complete, Compiles
- Coverage: API task creation latency
- Run Command:
export SQLX_OFFLINE=true cargo bench --package tasker-client --features benchmarks task_initialization
4. End-to-End Workflow Latency Benchmarks
- Location:
tests/benches/e2e_latency.rs - Status: β Complete, Compiles
- Coverage: Complete workflow execution (API β Result)
- Linear workflow (Ruby FFI)
- Diamond workflow (Ruby FFI)
- Linear workflow (Rust native)
- Diamond workflow (Rust native)
- Prerequisites: Docker Compose services running
- Run Command:
export SQLX_OFFLINE=true cargo bench --bench e2e_latency
β οΈ Placeholder Benchmarks (Need Decision)
5. Orchestration Benchmarks
- Location:
tasker-orchestration/benches/ - Files:
orchestration_benchmarks.rs- Empty placeholderstep_enqueueing.rs- Placeholder with documentation
- Status: Not implemented
- Documented Intent: Measure orchestration coordination latency
- Challenges:
- Requires triggering orchestration cycle without full execution
- Need step discovery measurement isolation
- Queue publishing and notification overhead breakdown
6. Worker Benchmarks
- Location:
tasker-worker/benches/ - Files:
worker_benchmarks.rs- Empty placeholderworker_execution.rs- Placeholder with documentationhandler_overhead.rs- Placeholder with documentation
- Status: Not implemented
- Documented Intent:
- Worker processing cycle (claim, execute, submit)
- Framework overhead vs pure handler execution
- Ruby FFI overhead measurement
- Challenges:
- Need pre-enqueued steps in test queues
- Noop handler implementations for baseline
- Breakdown metrics for each phase
Recommendations
Option 1: Keep Placeholders for Future Work β RECOMMENDED
Rationale:
- Phase 5.4 distributed benchmarks are documented but complex to implement
- E2E benchmarks (
e2e_latency.rs) already provide full workflow metrics - SQL benchmarks provide component-level detail
- Actor/Services refactor is more urgent than distributed component benchmarks
Action:
- Keep placeholder files with clear βNOT IMPLEMENTEDβ status
- Update comments to reference this audit document
- Future ticket (post-refactor) can implement if needed
Option 2: Remove Placeholders
Rationale:
- Reduce confusion about benchmark status
- E2E benchmarks already cover end-to-end latency
- SQL benchmarks cover database hot paths
Action:
- Delete placeholder bench files
- Document decision in this file
- Can recreate later if specific component isolation needed
Option 3: Implement Placeholders Now
Rationale:
- Complete benchmark suite before refactor
- Better baseline data for Actor/Services comparison
Concerns:
- 2-3 days implementation effort
- Delays Actor/Services refactor
- May need re-implementation post-refactor anyway
Decision: Option 1 (Keep Placeholders, Document Status)
We have sufficient benchmarking coverage:
- β SQL functions (hot path queries)
- β E2E workflows (user-facing latency)
- β Event propagation (LISTEN/NOTIFY)
- β Task initialization (API latency)
Whatβs Missing:
- Component-level orchestration breakdown (not critical for refactor)
- Worker cycle breakdown (available via OpenTelemetry traces)
- Framework overhead measurement (nice-to-have, not blocking)
Action Items:
- Update placeholder comments with βStatus: Planned for future implementationβ
- Reference this document for implementation guidance
- Move forward with profiling and refactor
Profiling Tooling Setup
Goals
- Identify Inefficiencies: Find hot spots in lifecycle code
- Establish Baseline: Profile before Actor/Services refactor
- Compare Post-Refactor: Validate performance impact of refactor
- Continuous Profiling: Enable ongoing performance analysis
Tool Selection
Primary: samply (macOS-friendly)
- GitHub: https://github.com/mstange/samply
- Advantages:
- Native macOS support (uses Instruments)
- Interactive web UI for flamegraphs
- Low overhead
- Works with release builds
- Use Case: Development profiling on macOS
Secondary: flamegraph (CI/production)
- GitHub: https://github.com/flamegraph-rs/flamegraph
- Advantages:
- Linux support (perf-based)
- SVG output for CI artifacts
- Well-established in Rust ecosystem
- Use Case: CI profiling, Linux production analysis
Tertiary: cargo-flamegraph (Convenience)
- Cargo Plugin: Wraps flamegraph-rs
- Advantages:
- Single command profiling
- Automatic symbol resolution
- Use Case: Quick local profiling
Installation
macOS Setup (samply)
# Install samply
cargo install samply
# macOS requires SIP adjustment for sampling (one-time setup)
# https://github.com/mstange/samply#macos-permissions
# Verify installation
samply --version
Linux Setup (flamegraph)
# Install prerequisites (Ubuntu/Debian)
sudo apt-get install linux-tools-common linux-tools-generic
# Install flamegraph
cargo install flamegraph
# Allow perf without sudo (optional)
echo 'kernel.perf_event_paranoid=-1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# Verify installation
flamegraph --version
Cross-Platform (cargo-flamegraph)
# Install cargo-flamegraph
cargo install cargo-flamegraph
# Verify installation
cargo flamegraph --version
Profiling Workflows
1. Profile E2E Benchmark (Recommended for Baseline)
Captures the entire workflow execution including orchestration lifecycle:
# macOS
samply record cargo bench --bench e2e_latency -- --profile-time=60
# Linux
cargo flamegraph --bench e2e_latency -- --profile-time=60
# Output: Interactive flamegraph showing hot paths
What to Look For:
- Time spent in
lifecycle/modules (task_initializer, step_enqueuer, result_processor, etc.) - Database query time vs business logic time
- Serialization/deserialization overhead
- Lock contention (should be minimal with our architecture)
2. Profile SQL Benchmarks
Isolates database performance:
# Profile just SQL function benchmarks
samply record cargo bench --package tasker-shared --features benchmarks sql_functions
# Output: Shows PostgreSQL function overhead
What to Look For:
- Time in
sqlxquery execution - Connection pool overhead
- Query planning time (shouldnβt be visible if using prepared statements)
3. Profile Integration Tests (Realistic Workload)
Profile actual test execution for realistic patterns:
# Profile a specific integration test
samply record cargo test --test e2e_tests e2e::rust::simple_integration_tests::test_linear_workflow
# Profile all integration tests (longer run)
samply record cargo test --test e2e_tests --all-features
What to Look For:
- Initialization overhead
- Test setup time vs actual execution time
- Repeated patterns across tests
4. Profile Specific Lifecycle Components
Isolate specific modules for deep analysis:
# Example: Profile only result processing
samply record cargo test --package tasker-orchestration --test lifecycle_integration_tests \
test_result_processing_updates_task_state --all-features -- --nocapture
# Or profile a unit test for a specific function
samply record cargo test --package tasker-orchestration \
result_processor::tests::test_process_step_result_success --all-features
Baseline Profiling Plan
Phase 1: Capture Pre-Refactor Baselines (Day 1)
Goal: Establish performance baseline of current lifecycle code before Actor/Services refactor
# 1. Clean build
cargo clean
cargo build --release --all-features
# 2. Profile E2E benchmarks (primary baseline)
samply record --output=baseline-e2e-pre-refactor.json \
cargo bench --bench e2e_latency
# 3. Profile SQL benchmarks
samply record --output=baseline-sql-pre-refactor.json \
cargo bench --package tasker-shared --features benchmarks
# 4. Profile specific lifecycle operations
samply record --output=baseline-task-init-pre-refactor.json \
cargo test --package tasker-orchestration \
lifecycle::task_initializer::tests --all-features
samply record --output=baseline-step-enqueue-pre-refactor.json \
cargo test --package tasker-orchestration \
lifecycle::step_enqueuer::tests --all-features
samply record --output=baseline-result-processor-pre-refactor.json \
cargo test --package tasker-orchestration \
lifecycle::result_processor::tests --all-features
Deliverables (completed, profiles removed β superseded by cluster benchmarks):
Baseline profile files in(removed)profiles/pre-refactor/- Performance baselines now in
docs/benchmarks/README.md
Phase 2: Identify Optimization Opportunities (Day 1)
Goal: Document current performance characteristics to preserve in refactor
Analysis Checklist:
- β Time spent in each lifecycle module (task_initializer, step_enqueuer, etc.)
- β Database query time breakdown
- β Serialization overhead (JSON, MessagePack)
- β Lock contention points (if any)
- β Unnecessary allocations or clones
- β Recursive call depth
Document Findings:
Performance baselines are now documented in docs/benchmarks/README.md.
The original lifecycle-performance-baseline.md was removed β its measurements had
data quality issues and the refactor it targeted is complete.
Phase 3: Post-Refactor Validation (After Refactor)
Goal: Validate Actor/Services refactor maintains or improves performance
# Re-run same profiling commands after refactor
samply record --output=baseline-e2e-post-refactor.json \
cargo bench --bench e2e_latency
# Compare baselines
# (samply doesn't have built-in diff, use manual comparison)
Success Criteria:
- E2E latency: Within 10% of baseline (preferably faster)
- SQL latency: Unchanged (no regression from refactor)
- Lifecycle operation time: Within 20% of baseline
- No new hot paths or contention points
Regression Signals:
- E2E latency >20% slower
- New allocations/clones in hot paths
- Increased lock contention
- Message passing overhead >5% of total time
Profiling Best Practices
1. Use Release Builds
# Always profile release builds (--release flag)
cargo build --release --all-features
samply record cargo bench --bench e2e_latency
Rationale: Debug builds have 10-100x overhead that masks real performance issues
2. Run Multiple Times
# Run 3 times, compare consistency
for i in {1..3}; do
samply record --output=profile-$i.json cargo bench --bench e2e_latency
done
Rationale: Catch warm-up effects, JIT compilation, cache behavior
3. Isolate Interference
# Close other applications
# Disable background processes (Spotlight, backups)
# Use consistent hardware (don't profile on battery power)
4. Focus on Hot Paths
80/20 Rule: 80% of time is spent in 20% of code
Priority Order:
- Top 5 functions by time (>5% each)
- Recursive calls (can amplify overhead)
- Locks and synchronization (contention multiplies)
- Allocations in loops (O(n) becomes visible)
5. Benchmark-Driven Profiling
Always profile realistic workloads:
- β E2E benchmarks (represents user experience)
- β Integration tests (real workflow patterns)
- β Unit tests (too isolated, not representative)
Flamegraph Interpretation
Reading Flamegraphs
βββββββββββββββββββββββββββββββββββββββββββββββ β Total Program Time (100%)
β β
β ββββββββββββββββββ βββββββββββββββββββ β
β β Database Ops β β Serialization β β β High-level Operations (60%)
β β (30%) β β (30%) β β
β β β β β β
β β ββββββββββββ β β βββββββββββββ β β
β β β SQL Exec β β β β JSON Ser β β β β Leaf Operations (25%)
β β β (25%) β β β β (20%) β β β
β ββββ΄βββββββββββ΄βββ ββββ΄ββββββββββββ΄ββ β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β Business Logic (20%) β β β Remaining Time
β ββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
Width = Time spent in function (including children) Height = Call stack depth Color = Function group (can be customized)
Key Patterns
1. Wide Flat Bars = Hot Path
βββββββββββββββββββββββββββββββββββββββββ
β step_enqueuer::enqueue_ready_steps() β β 40% of total time
βββββββββββββββββββββββββββββββββββββββββ
Action: Optimize this function
2. Deep Call Stack = Recursion/Abstractions
βββββββββββββββββββββββββββ
β process_dependencies() β
β ββββββββββββββββββββββββ
β β resolve_deps() ββ
β β βββββββββββββββββββββ
β β β check_ready() βββ
β β βββββββββββββββββββββ
β ββββββββββββββββββββββββ
βββββββββββββββββββββββββββ
Action: Consider flattening or caching
3. Many Narrow Bars = Fragmentation
βββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ¬ββ
βAβBβCβDβEβFβGβHβIβJβKβLβMβ β Many small functions
βββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ΄ββ
Action: Not necessarily bad (may be inlining), but check if overhead-heavy
Integration with CI
GitHub Actions Workflow (Future Enhancement)
# .github/workflows/profile-benchmarks.yml
name: Profile Benchmarks
on:
pull_request:
paths:
- 'tasker-orchestration/src/orchestration/lifecycle/**'
- 'tasker-shared/src/**'
jobs:
profile:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install flamegraph
run: cargo install flamegraph
- name: Profile benchmarks
run: |
cargo flamegraph --bench e2e_latency -- --profile-time=60 -o flamegraph.svg
- name: Upload flamegraph
uses: actions/upload-artifact@v3
with:
name: flamegraph
path: flamegraph.svg
- name: Compare with baseline
run: |
# TODO: Implement baseline comparison
# Download previous flamegraph, compare hot paths
Documentation Structure
Created Documents
-
This Document:
docs/observability/benchmark-audit-and-profiling-plan.md- Benchmark inventory
- Profiling tooling setup
- Baseline capture plan
-
Existing:
docs/observability/benchmarking-guide.md- SQL benchmark documentation
- Running instructions
- Performance expectations
-
(Removed β superseded bydocs/observability/lifecycle-performance-baseline.mddocs/benchmarks/README.md)
Next Steps
Before Actor/Services Refactor
- β Audit Complete: Documented benchmark status
- β³ Install Profiling Tools:
cargo install samply # macOS cargo install flamegraph # Linux - β³ Capture Baselines (1 day):
- Run profiling plan Phase 1
- Generate flamegraphs
- Document hot paths
- β
Baseline Document: Superseded by
docs/benchmarks/README.md
During Actor/Services Refactor
- Incremental Profiling: Profile after each major component conversion
- Compare Baselines: Ensure no performance regressions
- Document Changes: Note architectural changes affecting performance
After Actor/Services Refactor
- Full Re-Profile: Run profiling plan Phase 3
- Comparison Analysis: Document performance changes
- Update Documentation: Reflect new architecture
- Benchmark Updates: Update benchmarks if Actor/Services changes measurement approach
Summary
Current State:
- β SQL benchmarks working
- β E2E benchmarks working
- β Event propagation benchmarks working
- β Task initialization benchmarks working
- β οΈ Component benchmarks are placeholders (OK for now)
Decision:
- Keep placeholder benchmarks for future work
- Move forward with profiling and baseline capture
- Sufficient coverage to validate Actor/Services refactor
Action Plan:
- Install profiling tools (samply/flamegraph)
- Capture pre-refactor baselines (1 day)
- Document current hot paths
- Proceed with Actor/Services refactor
- Validate post-refactor performance
Success Criteria:
- Baseline profiles captured
- Hot paths documented
- Post-refactor validation plan established
- No performance regressions from refactor