Panoramic Observability: LoongCollector for Large-Scale Apache Flink and Spark Cluster

Runqi Lin, Hongyi Zhou

Chinese Session 2025-07-27 15:00 GMT+8  (ROOM : Mtn Yang Hall) #observability

In today’s era of artificial intelligence + big data, enterprises are facing the challenges of rapid growth and diversified needs of massive data. Combining Apache Flink/Spark, two popular distributed computing engines, enterprises can build flexible real-time and batch data processing pipelines. However, in a large-scale stream processing service cluster environment, observability faces many challenges, mainly including dynamic perception of elastic tasks, large amounts of observable data, and strict requirements for real-time performance. As a full-stack observable data collector, LoongCollector can help users efficiently collect and process logs, metrics, and tracking data. In this talk, we will focus on how to achieve the best practices of enterprise-level observability for Flink/Spark through LoongCollector. Outline: ● Chanllenges in large-scale cluster observability of Apache Flink/Spark ● Detailed explanation of LoongCollector observability capabilities and architecture ● Best practices for observability of LoongCollector in large-scale Apache Flink/Spark clusters ● Future prospects

Speakers:


Runqi Lin: Alibaba, Shanghai

Worked in the team of Alibaba Cloud Simple Log Service, mainly focusing on observable data collectors, gateway service monitoring, massive data access management.

Personal Links: Github: https://github.com/linrunqi08 Alibaba Cloud Developer Community: https://developer.aliyun.com/profile/xfmif6w7kun52

Previous speaking experience: 2022 QECon “Cloud Service API Observability Construction Practice” https://max.book118.com/html/2022/0824/8076007016004132.shtm


Hongyi Zhou: Alibaba,Hangzhou

As a member of the Alibaba Cloud E-MapReduce team, my core responsibilities include the design and implementation of highly reliable system architectures, establishing comprehensive business observability, and ensuring the stability of business operations.