Production Practice of Apache Gluten and Apache Celeborn at Xiaomi
Yongyuan Liang
Chinese Session #datastorageThis talk will dive into the real-world adoption of Apache Gluten and Apache Celeborn at Xiaomi, covering technical background, deployment journey, challenges, and future roadmap.
- Technology Landscape Xiaomi has built a large-scale offline computing platform centered around Spark, supporting over 100,000+ offline jobs running daily. This section will introduce the core technical architecture and key components Xiaomi relies on in offline computing, as well as the positioning of Gluten and Celeborn.
- Gluten in Production By adopting Gluten, Xiaomi achieved over 30% average reduction in job runtime and resource cost. We will share our deployment steps, optimization strategies, and the key challenges encountered during the integration.
- Celeborn in Production Celeborn played a key role in addressing the instability of Spark External Shuffle Service, significantly improving resource utilization and reducing overall costs. We will showcase its application in real scenarios and the performance tuning techniques we employed.
- Future Roadmap We will briefly share Xiaomi’s future plans for Spark, focusing on directions for performance optimization, stability improvements, and the introduction of new features
Speakers:
Computing Engine R&D Engineer. Responsible for the development of computing engine