Data Lake & Data Warehouse


Track Chairs : Lidong Dai, Shaofeng Shi, Zongtang Hu

Data Lake and Data Warehouse are important solutions for storing and managing data, and they play a crucial role in data management, data analysis, and decision-making. In ASF, there are various projects about Data Lake and Data Warehouse, for example: Apache Hive, Apache Hudi, Apache Iceberg, Apache Paimon, Apache Cassandra, Apache HBase etc. In this topic, you will get the latest status of data lake and warehouse, best practices the companies use them in the production, and the roadmap of these projects.

Unknown Date

Accelerate Spark Queries with Gluten and Velox Engine on Arm64 Chinese Session Yuqi Gu

Apache Amoro & iceberg in Huolala Prdouction Chinese Session Zheng Yu Chen

Apache Gravitino: The universal catalog for data and AI English Session Justin Mclean

Apache Hudi in Action: Accelerating Kuaishou's Data Warehouse Architecture Upgrade Chinese Session Chaoyang Liu

Apache Iceberg: Table Maintenance Strategies for High-Performance Data Lakehouses English Session Akshat Mathur

Apache Iceberg’s Hidden Superpowers: Governance, Experimentation, and Agentic Futures English Session Shekhar Prasad Rajak

Apache Polaris (Incubating) & Apache XTable: Unifying Iceberg, Hudi, and other Table Formats English Session Eric Maynard

Build a cloud native Lakehouse architecture based on Iceberg & Amoro & Gravitino in Tencent Cloud Chinese Session Jinsong Zhou

Building a real-time data lakehouse in practice Chinese Session Congxian Qiu

Building a Unified Lakehouse Solution with Apache Cloudberry Chinese Session Rose Duan

Impala on Iceberg with Puffins English Session Daniel Becker

Introduction to Apache Cloudberry: Evolution, Key Features, and Roadmap Chinese Session Max Yang

Optimizing Parquet Storage: Metadata Management, Performance Tuning & Seamless Migration Chinese Session Hongnan Gan

Resolving Data Silos: Apache Gravitino's Production Implementation Practices at Bilibili Chinese Session Tianhang Li

SF Express's Journey with Apache Spark and Gluten Chinese Session Weiting Chen

Supercharge Lakehouse Implementation with Apache Iceberg English Session Bill Zhang

Technical Progression of Flink + Paimon Real-time Lakehouse Solutions Chinese Session Xuannan Su

The Future of ETL with Branching & Tagging in Apache Hive English Session Attila Turóczy

Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe English Session Lidong Dai

Xiaomi's Efficient Data & AI Optimization with Apache Paimon Chinese Session Houliang Qi

​​Building Inverted Indexes on Iceberg with Tantivy: A Hands-on Approach​​ Chinese Session Longfei Liu