Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe
Lidong Dai
English Session #datalakeAbstract: With the ongoing evolution of lakehouse architectures and real-time data lakes, enterprises increasingly require a unified, high-performance, and scalable data integration framework to support both writing to and reading from multiple data lake formats such as Apache Hudi, Iceberg, and Paimon. As a next-generation data integration engine, Apache SeaTunnel leverages its Connector V2 architecture and stream-batch unified design to build deep integration support for these three mainstream data lake formats. It provides key capabilities such as batch ingestion, real-time streaming writes, and consistency guarantees.
Outline:
- SeaTunnel Architecture and Runtime Mechanism
Stream-batch unified architecture design philosophy Connector V2 plugin registration, runtime loading, and lifecycle management Support for nearly 200 data sources, covering databases, object storage, message queues, data lakes, and more
- Overview of Data Lake Support: Feature Comparison of Hudi / Iceberg / Paimon
Differences in write mechanisms among the three formats
SeaTunnel’s unified handling strategy and connector abstraction capability
-
Detailed Architecture Support for Iceberg in SeaTunnel
-
Streaming Integration Support for Apache Hudi
-
Seamless Integration Practices with Apache Paimon
-
Case Study: Enterprise-Grade Real-Time Lakehouse Implementation
Building a real-time product data warehouse with MySQL CDC → SeaTunnel → Iceberg
- Future Outlook and Ecosystem Integration Plans for supporting DeltaLake and more
Speakers:
Apache Incubator Mentor, Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member