Scalable Join & Aggregation with External State and Dynamic Tables
Feng Jin
Chinese Session #streaming1. Background & Motivation
• Flink SQL challenges at scale: large state, long joins, complex maintenance
• Business use case: automotive data warehouse and alerting systems
2. State Externalization with Delta Join and Merge Engine
• Delta Join: external dimension tables with up-to-date snapshots
• Merge Engine: external upsert tables for aggregation results
• How we minimized internal state while ensuring correctness and latency
3. The Role of Dynamic Tables
• Intermediate table explosion: a hidden cost of state externalization
• How Dynamic Tables automate schema creation, lifecycle, and data lineage
• Empowering developers: write business logic, forget the plumbing
4. Enabling Batch-Stream Unification
• Lightweight Flink state: only track offsets
• Same SQL logic for both real-time and backfill workflows
• Practical benefits in backfills, audits, and reprocessing
5. Results & Lessons Learned
• Developer experience and productivity gains
• Improved job resilience and observability
• Where we’re going next: tighter integration with metadata and scheduling
Speakers:
Feng Jin is currently a member of the Compute Platform team at Xiaomi, where he is responsible for maintaining the internal Flink framework and building the company’s real-time lakehouse architecture. He has extensive experience in large-scale stream processing, Flink SQL optimization, and state management. He is also an Apache Flink committer and actively contributes to the open-source community.