Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe

Lidong Dai

Chinese Session 2025-07-27 14:30 GMT+8 (ROOM : WanChun Hall) #datalake

Abstract: With the ongoing evolution of lakehouse architectures and real-time data lakes, enterprises increasingly require a unified, high-performance, and scalable data integration framework to support both writing to and reading from multiple data lake formats such as Apache Hudi, Iceberg, and Paimon. As a next-generation data integration engine, Apache SeaTunnel leverages its Connector V2 architecture and stream-batch unified design to build deep integration support for these three mainstream data lake formats. It provides key capabilities such as batch ingestion, real-time streaming writes, and consistency guarantees.

Outline:

SeaTunnel Architecture and Runtime Mechanism

Stream-batch unified architecture design philosophy Connector V2 plugin registration, runtime loading, and lifecycle management Support for nearly 200 data sources, covering databases, object storage, message queues, data lakes, and more

Overview of Data Lake Support: Feature Comparison of Hudi / Iceberg / Paimon

Differences in write mechanisms among the three formats

SeaTunnel’s unified handling strategy and connector abstraction capability

Detailed Architecture Support for Iceberg in SeaTunnel
Streaming Integration Support for Apache Hudi
Seamless Integration Practices with Apache Paimon
Case Study: Enterprise-Grade Real-Time Lakehouse Implementation

Building a real-time product data warehouse with MySQL CDC → SeaTunnel → Iceberg

Future Outlook and Ecosystem Integration Plans for supporting DeltaLake and more

Speakers:

Lidong Dai: WhaleOps Technology co-founder

Apache Incubator Mentor, Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member