Technical Progression of Flink + Paimon Real-time Lakehouse Solutions

Xuannan Su

Chinese Session 2025-07-27 16:15 GMT+8  (ROOM : WanChun Hall) #datalake

The lakehouse architecture has emerged as a transformative trend in recent years. By leveraging Flink as a stream-batch unified processing engine and Paimon as a stream-batch unified lake format, the Streaming Lakehouse architecture has enabled real-time data freshness for the lakehouse. While structured data remains widely used in Paimon, semi-structured and unstructured data are becoming increasingly critical in artificial intelligence applications. The Flink and Paimon communities have collaborated closely, combining their strengths and integrating cutting-edge features to deliver significant enhancements and optimizations for users. In this talk, we will introduce some of the important work, including:

  • How Flink and Paimon utilize the Variant data type and variant shredding to enhance performance when handling semi-structured data
  • How Flink leverages Paimon’s bucket mechanism to accelerate joins with Paimon dimension tables
  • Enhancements to simplify and improve the user experience in managing a data lakehouse

Speakers:


Xuannan Su: Alibaba Group

Xuannan is a software engineer at Alibaba. He is focusing on the development of Apache Flink and its ecosystem after he received his master’s degree from Carnegie Mellon University in 2019.