Data Lake and Data Warehouse are important solutions for storing and managing data, and they play a crucial role in data management, data analysis, and decision-making. In ASF, there are various projects about Data Lake and Data Warehouse, for example: Apache Hive, Apache Hudi, Apache Iceberg, Apache Paimon, Apache Cassandra, Apache HBase etc. In this topic, you will get the latest status of data lake and warehouse, best practices the companies use them in the production, and the roadmap of these projects.
Data Lake & Data Warehouse
Track Chairs : Lidong Dai, Shaofeng Shi, Zongtang Hu
- Unknown Date
-
Accelerate Spark Queries with Gluten and Velox Engine on Arm64 Chinese Session Yuqi Gu
Apache Amoro & iceberg in Huolala Prdouction Chinese Session Zheng Yu Chen
Apache Gravitino: The universal catalog for data and AI English Session Justin Mclean
Apache Hudi in Action: Accelerating Kuaishou's Data Warehouse Architecture Upgrade Chinese Session Chaoyang Liu
Apache Iceberg: Table Maintenance Strategies for High-Performance Data Lakehouses English Session Akshat Mathur
Apache Iceberg’s Hidden Superpowers: Governance, Experimentation, and Agentic Futures English Session Shekhar Prasad Rajak
Apache Polaris (Incubating) & Apache XTable: Unifying Iceberg, Hudi, and other Table Formats English Session Eric Maynard
Build a cloud native Lakehouse architecture based on Iceberg & Amoro & Gravitino in Tencent Cloud Chinese Session Jinsong Zhou
Building a real-time data lakehouse in practice Chinese Session Congxian Qiu
Building a Unified Lakehouse Solution with Apache Cloudberry Chinese Session Rose Duan
Impala on Iceberg with Puffins English Session Daniel Becker
Introduction to Apache Cloudberry: Evolution, Key Features, and Roadmap Chinese Session Max Yang
Optimizing Parquet Storage: Metadata Management, Performance Tuning & Seamless Migration Chinese Session Hongnan Gan
Resolving Data Silos: Apache Gravitino's Production Implementation Practices at Bilibili Chinese Session Tianhang Li
SF Express's Journey with Apache Spark and Gluten Chinese Session Weiting Chen
Supercharge Lakehouse Implementation with Apache Iceberg English Session Bill Zhang
Technical Progression of Flink + Paimon Real-time Lakehouse Solutions Chinese Session Xuannan Su
The Future of ETL with Branching & Tagging in Apache Hive English Session Attila Turóczy
Unified Data Lake Real-Time Integration: Decoding SeaTunnel’s Architectural Support for Hudi / Icebe English Session Lidong Dai
Xiaomi's Efficient Data & AI Optimization with Apache Paimon Chinese Session Houliang Qi
Building Inverted Indexes on Iceberg with Tantivy: A Hands-on Approach Chinese Session Longfei Liu