Apache Hudi in Action: Accelerating Kuaishou's Data Warehouse Architecture Upgrade

Chaoyang Liu

Chinese Session #datalake

Topic Introduction: Apache Hudi is a powerful table format that provides extensive capabilities for both offline and real-time scenarios. During the process of advancing its data warehouse architecture upgrade, Kuaishou has leveraged Hudi’s data lake capabilities to enhance timeliness, reduce costs, and improve development efficiency in scenarios such as real-time data ingestion into the lake, partial updates, and large wide table.

This topic is divided into three parts:

  1. Apache Hudi Use Cases and Challenges at Kuaishou:

Share Kuaishou’s Hudi-based business scenarios and the challenges encountered during large-scale implementation.

  1. In-Depth Optimization and Benefits of Apache Hudi at Kuaishou:

Introduce technical solutions to address these challenges, including native engine-optimized record formats, flexible bucketing index capabilities, and robust non-blocking concurrency control. Demonstrate the improvements in timeliness, performance, cost efficiency, usability, and system stability through real-world case studies.

  1. Future Outlook:

Discuss Kuaishou’s roadmap for integrating data lake capabilities with BI (Business Intelligence) and AI (Artificial Intelligence) initiatives to drive further innovation.

Speakers:


Core Hudi Engineer at Kuaishou | Apache Hudi Active Contributor | Apache RocketMQ Committer