Data Lake and Data Warehouse are important solutions for storing and managing data, and they play a crucial role in data management, data analysis, and decision-making. In ASF, there are various projects about Data Lake and Data Warehouse, for example: Apache Hive, Apache Hudi, Apache Iceberg, Apache Paimon, Apache Cassandra, Apache HBase etc. In this topic, you will get the latest status of data lake and warehouse, best practices the companies use them in the production, and the roadmap of these projects.
Data Lake & Data Warehouse
Track Chairs : Lidong Dai, Shaofeng Shi, Zongtang Hu
2025-07-25
-
14:00 GMT+8 Apache Iceberg: Table Maintenance Strategies for High-Performance Data Lakehouses English Session Akshat Mathur
14:30 GMT+8 Apache Iceberg’s Hidden Superpowers: Governance, Experimentation, and Agentic Futures English Session Shekhar Prasad Rajak
15:00 GMT+8 Apache Amoro & iceberg in Huolala Prdouction Chinese Session Zheng Yu Chen
15:45 GMT+8 Apache Gravitino: The universal catalog for data and AI English Session Justin Mclean
16:15 GMT+8 Apache Hudi in Action: Accelerating Kuaishou's Data Warehouse Architecture Upgrade Chinese Session Chaoyang Liu
16:45 GMT+8 Building Inverted Indexes on Iceberg with Tantivy: A Hands-on Approach Chinese Session Longfei Liu
17:15 GMT+8 Accelerate Spark Queries with Gluten and Velox Engine on Arm64 Chinese Session Yuqi Gu
2025-07-26
-
14:00 GMT+8 Apache Polaris (Incubating) & Apache XTable: Unifying Iceberg, Hudi, and other Table Formats English Session Eric Maynard
14:30 GMT+8 Impala on Iceberg with Puffins English Session Daniel Becker
15:00 GMT+8 Building a real-time data lakehouse in practice Chinese Session Congxian Qiu, Zhuojun Jiang
15:45 GMT+8 Building a Unified Lakehouse Solution with Apache Cloudberry Chinese Session Rose Duan
16:15 GMT+8 Supercharge Lakehouse Implementation with Apache Iceberg English Session Bill Zhang
16:45 GMT+8 Optimizing Parquet Storage: Metadata Management, Performance Tuning & Seamless Migration Chinese Session Hongnan Gan, Zhengjie He
17:15 GMT+8 Build a cloud native Lakehouse architecture based on Iceberg & Amoro & Gravitino in Tencent Cloud Chinese Session Jinsong Zhou
2025-07-27
-
14:00 GMT+8 Resolving Data Silos: Apache Gravitino's Production Implementation Practices at Bilibili Chinese Session Tianhang Li
14:30 GMT+8 Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe Chinese Session Lidong Dai
15:00 GMT+8 Introduction to Apache Cloudberry: Evolution, Key Features, and Roadmap Chinese Session Max Yang
15:45 GMT+8 The Future of ETL with Branching & Tagging in Apache Hive English Session Attila Turóczy
16:15 GMT+8 Technical Progression of Flink + Paimon Real-time Lakehouse Solutions Chinese Session Xuannan Su
16:45 GMT+8 SF Express's Journey with Apache Spark and Gluten Chinese Session Weiting Chen, Xixu Wang, Feilong He
17:15 GMT+8 Xiaomi's Efficient Data & AI Optimization with Apache Paimon Chinese Session Houliang Qi