Data Lake & Data Warehouse


Track Chairs : Lidong Dai, Shaofeng Shi, Zongtang Hu

Data Lake and Data Warehouse are important solutions for storing and managing data, and they play a crucial role in data management, data analysis, and decision-making. In ASF, there are various projects about Data Lake and Data Warehouse, for example: Apache Hive, Apache Hudi, Apache Iceberg, Apache Paimon, Apache Cassandra, Apache HBase etc. In this topic, you will get the latest status of data lake and warehouse, best practices the companies use them in the production, and the roadmap of these projects.

2025-07-25

14:00 GMT+8 Apache Iceberg: Table Maintenance Strategies for High-Performance Data Lakehouses English Session Akshat Mathur

14:30 GMT+8 Apache Iceberg’s Hidden Superpowers: Governance, Experimentation, and Agentic Futures English Session Shekhar Prasad Rajak

15:00 GMT+8 Apache Amoro & iceberg in Huolala Prdouction Chinese Session Zheng Yu Chen

15:45 GMT+8 Apache Gravitino: The universal catalog for data and AI English Session Justin Mclean

16:15 GMT+8 Apache Hudi in Action: Accelerating Kuaishou's Data Warehouse Architecture Upgrade Chinese Session Chaoyang Liu

16:45 GMT+8 ​​Building Inverted Indexes on Iceberg with Tantivy: A Hands-on Approach​​ Chinese Session Longfei Liu

17:15 GMT+8 Accelerate Spark Queries with Gluten and Velox Engine on Arm64 Chinese Session Yuqi Gu

2025-07-26

14:00 GMT+8 Apache Polaris (Incubating) & Apache XTable: Unifying Iceberg, Hudi, and other Table Formats English Session Eric Maynard

14:30 GMT+8 Impala on Iceberg with Puffins English Session Daniel Becker

15:00 GMT+8 Building a real-time data lakehouse in practice Chinese Session Congxian Qiu, Zhuojun Jiang

15:45 GMT+8 Building a Unified Lakehouse Solution with Apache Cloudberry Chinese Session Rose Duan

16:15 GMT+8 Supercharge Lakehouse Implementation with Apache Iceberg English Session Bill Zhang

16:45 GMT+8 Optimizing Parquet Storage: Metadata Management, Performance Tuning & Seamless Migration Chinese Session Hongnan Gan, Zhengjie He

17:15 GMT+8 Build a cloud native Lakehouse architecture based on Iceberg & Amoro & Gravitino in Tencent Cloud Chinese Session Jinsong Zhou

2025-07-27

14:00 GMT+8 Resolving Data Silos: Apache Gravitino's Production Implementation Practices at Bilibili Chinese Session Tianhang Li

14:30 GMT+8 Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe Chinese Session Lidong Dai

15:00 GMT+8 Introduction to Apache Cloudberry: Evolution, Key Features, and Roadmap Chinese Session Max Yang

15:45 GMT+8 The Future of ETL with Branching & Tagging in Apache Hive English Session Attila Turóczy

16:15 GMT+8 Technical Progression of Flink + Paimon Real-time Lakehouse Solutions Chinese Session Xuannan Su

16:45 GMT+8 SF Express's Journey with Apache Spark and Gluten Chinese Session Weiting Chen, Xixu Wang, Feilong He

17:15 GMT+8 Xiaomi's Efficient Data & AI Optimization with Apache Paimon Chinese Session Houliang Qi