Challenge

Product team needed real-time customer event data to power personalization and retention features. Existing batch pipeline had 24-hour latency, causing loss of time-sensitive opportunities. No single source of truth for events; multiple siloed data sources creating inconsistency and trust issues.

Approach

Technical Architecture:

Project Management:

Team

Results

Delivery Metrics

Technical Impact

Business Impact

Key Decisions

  1. Chose Kafka over managed streaming - Cost savings ($200K/yr) justified operational complexity; team gained platform ownership
  2. Guild governance model - Enabled decentralized adoption while maintaining quality standards
  3. Enforced schema validation - Avoided downstream data quality issues; caught 12 schema conflicts in first 3 months
  4. Invested in observability upfront - Kafka + Spark monitoring prevented 4 major incidents