SF Express's Journey with Apache Spark and Gluten

Weiting Chen, Xixu Wang, Feilong He

Chinese Session 2025-07-27 16:45 GMT+8 (ROOM : WanChun Hall) #datalake

The session by SF Express delves into their use of Apache Spark and Apache Gluten within their production environment. It addresses the identification of current bottlenecks, the rationale for selecting Gluten as a Spark plugin, the need for a vectorized engine, ongoing research in this area, and the tangible cost savings and performance improvements achieved in their real-world operations. The presentation provides detailed insights into SF Express’s challenges, decision-making process, and the transformative impact of adopting a vectorized engine in their large-scale data processing pipeline. The session will cover how SF Express chose Gluten and Velox as their native engine solution, how they integrated Gluten with their existing Spark setup, the cost savings and performance gains realized after adopting Gluten in their production environment, and their future plans for Spark and Gluten.

Speakers:

Weiting Chen: Intel, Senior Software Engineer

Weiting is a senior software engineer at Intel’s Data Center and AI Group. With a decade of experience, he specializes in Big Data and Cloud Solutions. He has made significant contributions to projects like Spark, OpenStack, and recently, the Apache Gluten (Incubating) project as one of its initial committers. Among his responsibilities is harnessing the potential of hardware to enhance the performance of big data workloads.

Xixu Wang: Shunfeng Technology, Big Data Platform R&D Senior Engineer, apache doris committer, apache kudu pmc member

He has worked in Baidu, Weibo, Xiaomi, and Shenze, mainly engaged in big data computing engine, storage engine development, and has participated in Apache Doris, Apache Kudu, Apache Gluten development, and has a wealth of experience in big data field.

Backend has worked in Baidu, Weibo, Xiaomi, Shenze, mainly engaged in big data computing engine, storage engine development, backend involved in Apache Doris, Apache Kudu, Apache Gluten development, has extensive experience in the field of big data

Feilong He: Intel, Software Engineer

Mr. Feilong He is a software engineer at Intel with more than seven years of specialized experience in large-scale data processing. He is the co-creator of the Apache-incubating Gluten project and holds critical roles as both an Apache committer and a member of the Podling Project Management Committee (PPMC). As one of the top contributors to the project, Mr. He has made substantial technical contributions that have significantly improved the system’s runtime stability and performance. His efforts have been instrumental in the project’s success and directly contributed to its adoption by leading global technology companies, including Microsoft, Google, Uber, Pinterest, ByteDance, and Baidu. The project has been credited with delivering measurable performance gains and cost reductions across these organizations. At present, Gluten serves as a core infrastructure component within their internal data platforms, supporting high-throughput, large-scale query execution over massive data volumes on a daily basis.

In addition to his work on Gluten, Mr. He is an active contributor to Meta’s open-source Velox project. He also previously served as the lead maintainer of Intel’s Smart Storage Management project, which focused on performance optimizations for the Hadoop Distributed File System (HDFS) and has been deployed by several domestic enterprise users.

Mr. He holds a master’s degree in computer science from Sun Yat-sen University, where he conducted research in large-scale optimization. A key outcome of his academic research was the publication of a high-impact, peer-reviewed paper in Information Sciences, which has been cited by over 100 scholarly articles, including those published in top-tier journals and presented at leading academic conferences in the field.