Celeborn’s Revolution in Multi-Engine Support, Performance Mastery, and Enterprising Innovation
Jiashu Xiong
Chinese Session #datastorageApache Celeborn has made significant progress over the past year, introducing new capabilities, performance optimizations, and expanded engine support.
Functional enhancements include: end-to-end validation for data integrity, Multi-layer storage and HybridShuffle for flexible data management, CLI tools and RESTful API for enhanced usability, Multi-level quota for resource governance, worker tags, dynamic configuration, etc.
Performance improvements address: Spark skew optimization to eliminate extra sorting in skewed scenarios, SortShuffle partition splitting to prevent performance degradation from uneven partitions, Reduced latency in Commit and Fetch phases by optimizing synchronization bottlenecks.
Engine support now includes Blaze and Flink’s HybridShuffle alongside existing support for MR and Spark.
Additionally, stability has been strengthened, and the community remains active, with Celeborn becoming the preferred shuffle service for many organizations globally.
Speakers:
Apache Celeborn PMC member, mainly focused on the optimization of Apache Celeborn and the integration of Apache Celeborn with engines such as Flink and Spark