Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe

Lidong Dai

English Session #datalake

Abstract: With the ongoing evolution of lakehouse architectures and real-time data lakes, enterprises increasingly require a unified, high-performance, and scalable data integration framework to support both writing to and reading from multiple data lake formats such as Apache Hudi, Iceberg, and Paimon. As a next-generation data integration engine, Apache SeaTunnel leverages its Connector V2 architecture and stream-batch unified design to build deep integration support for these three mainstream data lake formats. It provides key capabilities such as batch ingestion, real-time streaming writes, and consistency guarantees.

Outline:

  1. SeaTunnel Architecture and Runtime Mechanism

Stream-batch unified architecture design philosophy Connector V2 plugin registration, runtime loading, and lifecycle management Support for nearly 200 data sources, covering databases, object storage, message queues, data lakes, and more

  1. Overview of Data Lake Support: Feature Comparison of Hudi / Iceberg / Paimon

Differences in write mechanisms among the three formats

SeaTunnel’s unified handling strategy and connector abstraction capability

  1. Detailed Architecture Support for Iceberg in SeaTunnel

  2. Streaming Integration Support for Apache Hudi

  3. Seamless Integration Practices with Apache Paimon

  4. Case Study: Enterprise-Grade Real-Time Lakehouse Implementation

Building a real-time product data warehouse with MySQL CDC → SeaTunnel → Iceberg

  1. Future Outlook and Ecosystem Integration Plans for supporting DeltaLake and more

Speakers:


Apache Incubator Mentor, Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member