SF Express's Journey with Apache Spark and Gluten
Weiting Chen
Chinese Session #datalakeThe session by SF Express delves into their use of Apache Spark and Apache Gluten within their production environment. It addresses the identification of current bottlenecks, the rationale for selecting Gluten as a Spark plugin, the need for a vectorized engine, ongoing research in this area, and the tangible cost savings and performance improvements achieved in their real-world operations. The presentation provides detailed insights into SF Express’s challenges, decision-making process, and the transformative impact of adopting a vectorized engine in their large-scale data processing pipeline. The session will cover how SF Express chose Gluten and Velox as their native engine solution, how they integrated Gluten with their existing Spark setup, the cost savings and performance gains realized after adopting Gluten in their production environment, and their future plans for Spark and Gluten.
Speakers:
Weiting is a senior software engineer at Intel’s Data Center and AI Group. With a decade of experience, he specializes in Big Data and Cloud Solutions. He has made significant contributions to projects like Spark, OpenStack, and recently, the Apache Gluten (Incubating) project as one of its initial committers. Among his responsibilities is harnessing the potential of hardware to enhance the performance of big data workloads.