Apache Amoro & iceberg in Huolala Prdouction
Zheng Yu Chen
Chinese Session #datalakeAt Huolala, we leveraged Apache Amoro to build a next-generation lakehouse platform with Apache Iceberg as its core. While managing petabyte-scale daily data processing, we tackled critical stability challenges, including:
Production Stability Challenges:
- Small file compaction and snapshot expiration in production
- Large-scale data migration with object storage (e.g., S3, OSS)
- Real-time monitoring & alerting for merge tasks and metadata operations
- Optimized table design to minimize small files and compaction overhead
Architectural Innovations:
- Iceberg schema & partitioning strategies tailored for logistics data
- Preventive file management to reduce merge pressure
- Automated maintenance for long-term stability
In this session, we’ll share:
Lessons learned from scaling Iceberg in a high-volume logistics platform Key optimizations for performance and reliability Our lakehouse journey—from PoC to production at Huolala
This talk will provide actionable insights for data engineers and architects implementing Iceberg in large-scale, real-world environments.
Speakers:
Chen Zheng Yu (ConradJam), Apache Flink Chinese community volunteer,Apache Amoro PPMC,he has been engaged in data development in bigdata for a long time. At present, he is responsible for the design and construction of Flink public multi-cloud data solution, data synchronization and big data operation pipeline platform from 0 to 1