Technical Progression of Flink + Paimon Real-time Lakehouse Solutions

Xuannan Su

Chinese Session #datalake

The lakehouse architecture has emerged as a transformative trend in recent years. By leveraging Flink as a stream-batch unified processing engine and Paimon as a stream-batch unified lake format, the Streaming Lakehouse architecture has enabled real-time data freshness for the lakehouse. While structured data remains widely used in Paimon, semi-structured and unstructured data are becoming increasingly critical in artificial intelligence applications. The Flink and Paimon communities have collaborated closely, combining their strengths and integrating cutting-edge features to deliver significant enhancements and optimizations for users. In this talk, we will introduce some of the important work, including:

  • How Flink and Paimon utilize the Variant data type and variant shredding to enhance performance when handling semi-structured data
  • How Flink leverages Paimon’s bucket mechanism to accelerate joins with Paimon dimension tables
  • Enhancements to simplify and improve the user experience in managing a data lakehouse

Speakers:


Xuannan 是阿里巴巴的一名软件工程师,2019 年获得卡内基梅隆大学硕士学位后,专注于 Apache Flink 及其生态系统的开发。