Quick Setup
This directory contains scripts to quickly set up and test the data pipeline resilience examples from Chapter 2.
๐ Quick Start
One-Command Setup (Recommended)
The fastest way to try the example with zero local dependencies:
# Download and run the setup script
curl -fsSL https://raw.githubusercontent.com/tasker-systems/tasker/main/spec/blog/post_02_data_pipeline_resilience/setup-scripts/blog-setup.sh | bash
# Or with custom app name
curl -fsSL https://raw.githubusercontent.com/tasker-systems/tasker/main/spec/blog/post_02_data_pipeline_resilience/setup-scripts/blog-setup.sh | bash -s -- --app-name my-pipeline-demoRequirements: Docker and Docker Compose only
Local Setup
If you prefer to run the setup script locally:
# Download the script
curl -fsSL https://raw.githubusercontent.com/tasker-systems/tasker/main/spec/blog/post_02_data_pipeline_resilience/setup-scripts/blog-setup.sh -o blog-setup.sh
chmod +x blog-setup.sh
# Run with options
./blog-setup.sh --app-name pipeline-demo --output-dir ./demos๐ ๏ธ How It Works
Docker-Based Architecture
The setup script creates a complete Docker environment with:
Rails application with live code reloading
PostgreSQL 15 database with sample data
Redis 7 for background job processing
Sidekiq for workflow execution
All tested code examples from the GitHub repository
Integration with Tasker Repository
All code examples are downloaded directly from the tested repository:
This ensures the examples are always up-to-date and have passed integration tests.
๐ What Gets Created
Application Structure
API Endpoints
POST /analytics/start- Start the analytics pipelineGET /analytics/status/:task_id- Monitor pipeline progressGET /analytics/results/:task_id- Get generated insights
๐งช Testing the Pipeline Resilience
Start the Application
Wait for all services to be ready (you'll see "Ready for connections" messages).
Start Analytics Pipeline
Monitor Pipeline Progress
Get Pipeline Results
Test with Different Date Ranges
๐ง Key Features Demonstrated
Parallel Processing
The pipeline demonstrates parallel data extraction:
Orders, users, and products are extracted simultaneously
Transformations wait for their dependencies to complete
Maximum resource utilization without bottlenecks
Progress Tracking
Real-time visibility into long-running operations:
Batch processing with progress updates
Estimated completion times
Current operation status
Intelligent Retry Logic
Different retry strategies for different failure types:
Database timeouts: 3 retries with exponential backoff
CRM API failures: 5 retries (external services can be flaky)
Dashboard updates: 3 retries (eventual consistency)
Data Quality Assurance
Built-in data validation and quality checks:
Schema validation for extracted data
Completeness checks for critical fields
Anomaly detection for unusual patterns
Business Intelligence
The pipeline generates actionable insights:
Customer segmentation and churn risk analysis
Product performance and inventory optimization
Revenue analysis and profit margin tracking
Automated business recommendations
๐ Monitoring and Observability
Docker Logs
Pipeline Monitoring
Progress Tracking
Each step provides detailed progress information:
Records processed vs. total records
Current batch being processed
Estimated time remaining
Data quality metrics
๐ ๏ธ Customization
Adding New Data Sources
Create a new extraction step handler
Add it to the YAML configuration
Update transformation steps to use the new data
Example:
Modifying Business Logic
Update the insight generation in generate_insights_handler.rb:
Adjusting Retry Policies
Update the YAML configuration:
๐ง Troubleshooting
Common Issues
Docker services won't start:
Ensure Docker is running:
docker --versionCheck for port conflicts:
docker-compose psFree up resources:
docker system prune
Pipeline doesn't start:
Ensure all services are healthy:
docker-compose psCheck Sidekiq is running:
docker-compose logs sidekiqVerify database is ready:
docker-compose exec web rails db:migrate:status
Steps fail with data errors:
Check sample data exists:
docker-compose exec web rails consoleVerify data quality: Check for null values or invalid formats
Review step logs:
docker-compose logs -f sidekiq
No progress updates:
Ensure Redis is running:
docker-compose exec redis redis-cli pingCheck step handler implementations include progress tracking
Verify event subscribers are loaded
Getting Help
Check service status:
docker-compose psView logs:
docker-compose logs -fRestart services:
docker-compose restartClean restart:
docker-compose down && docker-compose up
๐ฎ Related Examples
Chapter 1: E-commerce Reliability - Foundation patterns
Chapter 3: Microservices Coordination - Service orchestration
๐ Learn More
Code Examples: GitHub Repository
Integration Tests: See how the examples are tested in the repository
๐ Cleanup
When you're done experimenting:
๐ก Next Steps
Once you have the pipeline running:
Experiment with failure scenarios - Stop dependencies mid-processing
Customize the business logic - Modify customer segmentation rules
Add new data sources - Extend with additional extractions
Implement real integrations - Replace mock APIs with real services
Scale the processing - Test with larger datasets
The patterns demonstrated here scale from simple ETL jobs to enterprise data platforms handling millions of records.
Last updated